Springer is digitising its archive of books. Siân Harris finds out what the publisher is doing and what challenges it faces with this large-scale task
‘The book will never die,’ declares Springer in its advertisements and exhibition banners. A bold statement and one that book lovers in all walks of life will applaud but there is a particular reason for the prediction appearing in Springer’s marketing material: the publisher has embarked on a large-scale digitisation project to help make this statement true for its own books.
Since the publisher launched its e-book portfolio in 2005 it has built up a collection of over 50,000 academic e-book titles on its SpringerLink platform. But Springer began publishing in 1842 so this figure is less than half the total number of books published during its history.
‘Springer is in transition from traditional publishing with print to a complete electronic model where digital is the primary format,’ commented Thijs Willems, project manager, digital archives at the company. ‘We have a lot of books published in the past that people want to access and these are often out of print now.’
The anticipated demand for older book titles is based partly on the publishers’ experience with its journal archive. ‘For customers who’ve purchased the journal archive, 20-25 per cent of usage on SpringerLink is of journal content from before 1997,’ noted Willems. ‘And the value of older books is even higher than for old journals so we believe that demand will be high.’
Springer began digitising its old books for the Springer Book Archives at the end of 2010. It plans to load the books onto the SpringerLink platform during 2012, with the aim of completing the process by the end of the year.
Such a task is not trivial. The Springer backlist includes tens of thousands of books, published under many different imprints over the course of nearly two centuries. One of the first challenges, according to Willems, has been to track down all the print books. Although many print titles have been kept by Springer, the archive team has had to rely on national libraries and other collections to fill the gaps. Working with national libraries also helps because they have good catalogues of books, Willems added.
The work with national libraries throws up an interesting issue for Springer’s project. Many national libraries – as well as organisations such as Google – have already been carrying out extensive digitisation work so, on the surface, it might seem sensible to try to look for overlaps to avoid duplication.
This is not the case, however because the reasons for digitisation are different and therefore so are the results. National libraries, said Willems, are interested in preservation so they digitise books to look like the print books in their collection – complete with any markings added by scholars through the years. And the work of organisations like Google Books and Project Gutenberg is restricted to copyright-free content and has different quality criteria.
For Springer, the aim is to produce e-books that are like the titles were as brand-new print books but with all the additional metadata, search capabilities and other functionality that users already get with the publishers’ front list titles, including the ability to print on demand. ‘We looked at what Google did but the quality standards we’d work to are higher and we want to be able to reprint the whole book,’ explained Willems.
And of course, unlike other digitisation initiatives, the publisher will sell the resulting content. Willems anticipates that this will be under similar business models to those of Springer’s frontlist e-books, with subject collections available to institutions, single titles available to individuals and sales through resellers and aggregators. He said that the publisher is still discussing with customers about whether the archive will be available on an outright purchase model.
Another major challenge for Springer with this project has been rights. As Willems explained: ‘For books that are 30-40 years old, electronic rights weren’t mentioned in the original deals. A big challenge is finding the authors to get e-book rights. We’re letting the author community know about this so they can come to us.’ He added that the discussions with authors also address what they want to do with the e-book royalties, with the option to donate to two charities, either Research4Life or INASP, which help provide scholarly-information access to developing-world researchers.
‘The majority of authors are really enthusiastic about the archive; it means that their book will never be out of print again,’ said Willems, who added that the main situation where authors do not wish their titles to participate is because they have already made their titles into e-books.
The digitisation work for the archive is being carried out by SPI in the Philippines, which also digitised Springer’s journal archive. ‘We send them our print books and they do high-resolution scans of all the pages separately,’ described Willems. ‘They use special scanners if a book is very fragile.’
After scanning the pages, the images are then scanned separately with different resolution and different contrast to achieve better quality. ‘We learnt the need for that from doing our journal archive,’ said Willems.
After scanning, the team at SPI cleans up the pages and uses optical character recognition to obtain the text. Text recognition is a particular challenge where old fonts are used. Then there is extraction of all the metadata to enable searching. Willems noted that this is perhaps the largest step in the whole process. There are also several quality-assurance steps, at SPI, at an independent company and at Springer.
Once all this is done, the different formats are created. The majority of the books will be available as searchable PDFs, with an accompanying XML file that contains bibliographic information and references. ‘To the user, the book looks like the original – with the exception of the copyright page where we need to add an ISBN and DOI,’ Willems continued.
At the beginning, only a small percentage of books will also be available as EPUB files. The books chosen for this will be those that are particularly popular or have many images. The EPUB files are made in-house from XML files. This development will echo plans for Springer’s frontlist, of which the publisher plans to roll out EPUB versions soon.
As Derk Haank, CEO of Springer Science+Business Media, commented at the announcement of the project, ‘Up to now, our past titles have been hidden away in our in-house library, but thanks to innovative technologies they can be made available again. At Springer, a book will never die, but "out of print" will.’