Text-access project expands to Southern and Eastern Europe

Share this on social media:

The European IMPACT (Improving Access to Text) project has welcomed 11 new partners from Southern and Eastern Europe. The project’s goals are to optimise OCR (optical character recognition) software and language technology for historical material and to share institutional knowledge and expertise on digitisation.

In addition, the IMPACT Centre of Competence, which will be launched in early 2011, will provide a central service entry point for all libraries, archives and museums involved in the digitisation of text material.

IMPACT now brings together 26 national and regional libraries, research institutions and commercial suppliers. The project is coordinated by the National library of the Netherlands and runs from 2008-2011.

In the current second phase (2010-2011), an additional four national libraries, four research centres and three universities from France, Spain, Poland, Bulgaria, Slovenia and the Czech Republic have been added, bringing in expertise in six additional languages. The new research partners will work on building historical lexica for their languages, while the libraries will provide datasets. An important aim of this extension to the project is to arrive at a cross-language view of the accessibility and enhancement of digitised text.

The IMPACT objective is to improve significantly the accessibility of historical printed text. Material dated before 1900 is difficult to access in a digital form, because today's OCR software does not provide satisfactory results for old books, magazines and newspapers. In addition, libraries, archives and other content-holding institutions across Europe lack the experience and know-how in the process of digitisation. The historical language barrier also forms a stumbling block.

IMPACT includes two industry partners, IBM and ABBYY, which are involved in setting up a text-recognition system based on an adaptive model which automatically tunes itself to each new book being digitised. With the online Collaborative Correction web application linked to this system, volunteers across Europe will be able to contribute their efforts to the correction process of OCR results for further improvement. In addition, IMPACT explores new approaches in image enhancement and segmentation and in the use of language technology and historical lexica in OCR processing and Information Retrieval. The IMPACT tools will become available as interoperable web services integrated with a user friendly platform.