The path to digital preservation has been peppered with problems, but progress has been rapid, reports Rebecca Pool
Ten years ago, many in scholarly publishing worried about digital preservation, but didn’t know what to do. Preservation solutions were nigh on non-existent and nobody knew how to fund them anyway. Doubts over publisher willingness to deposit journals content twinned with confusion over how libraries would access it didn’t help and, crucially, how would you encourage publishers and libraries to actually come forward and cooperate?
‘Existing market mechanisms had failed to produce a preservation approach that garnered the support of a wide range of publishers and libraries,’ explains Eileen Fenton, who was managing director of Portico, one of several organisations now providing digital preservation services for the academic community. ‘The infrastructure required for digital preservation is extensive and carries significant costs; both publishers and libraries had to invest in the solution to make it sustainable.’
With this in mind, Portico adopted a not-for-profit approach and set about working with publishers and libraries to understand technology and business issues. Publishers were required to hand over preservation rights while appropriate library access to archived content was agreed.
‘We took an approach that we believed balanced the needs of libraries and publishers,’ recalls Fenton. ‘We decided that access to content would only be gained under special circumstances, so-called trigger events, and participating libraries could visit our audit site for verification purposes. Both libraries and publishers [make] an annual payment to the preservation services with access provided to those that contribute.’
Nearly 10 years on from the creation of Portico, the not-for-profit model is working. The organisation has more than 700 supporting libraries and 130 participating publishers and the content that it preserves has extended beyond journals to e-books and other scholarly content.
Another approach to preservation, the Stanford Universities-based LOCKSS (Lots of Copies Keeps Stuff Safe) project, which began being deployed in 2000, uses open-source software to detect and store participating publishers’ content on computer networks at hundreds of libraries worldwide. Likewise, CLOCKSS, a spin-out project for triggered content, now has around 140 libraries archiving content from nearly 500 publishers.
There are many preservation initiatives from national libraries such as Koninklijke Bibliotheek in the Netherlands and The British Library too.
As Portico’s present managing director, Kate Wittenberg, says: ‘We believe that [the amount of competition] is a sign of a continuing need in the community, and is evidence of a robust market for preservation.’
Robust or not, digital preservation still has challenges. In May 2008, the UK’s Joint Information Systems Committee (JISC) released its ‘Comparative study of e-journal archiving solutions’, concluding: ‘None currently offers the typical academic library a complete solution to its archival needs. Nor do any cover the greater proportion of journals titles being published today.’
Time should remedy this, and as Rachel Bruce, JISC’s digital infrastructure programme director points out, these and other initiatives have already solved many technical issues. She believes that two JISC-funded projects have been crucial to progress. These projects are CEDARS, an early collaboration between the universities of Oxford, Cambridge and Leeds that explored metadata issues, and the CAMiLEON Project, an initiative between the Universities of Michigan and Leeds that considered obsolete technologies. ‘These also helped bodies like the British Library get support to ensure digital preservation received the resources it required,’ she adds.
Bruce is hopeful that, one day, digital preservation will become ‘so embedded we don’t necessarily notice it’. As she highlights, recent progress on rights and licencing issues, including support for digital preservation from the UK government-supported Hargreaves Intellectual Property review, could help to ensure this.
But as the digital preservation of journals, and indeed books, gains ground, what about website preservation? In 2009, The British Library’s chief executive, Lynne Brindley, warned that our cultural heritage was at risk as the internet evolved and websites altered content, focusing media attention on the so-called digital black hole.
Richard Gibby, legal deposit manager at The British Library has long considered the issues around non-print legal deposit. Traditional copyright law has meant legal deposit libraries, such as The British Library, could only archive websites after copyright permission had been obtained from each website owner. But as Gibby points out, this ‘burdensome and bureaucratic process’ will soon change.
‘The government is proposing regulations that will give legal deposit libraries the copyright permission they need to collect material from freely-available websites,’ he explains. ‘It will also oblige publishers to give us a password to access journals that have only been available behind a pay-wall.’
But, as with electronic journal preservation, challenges lie head. First, many publishers have data security concerns. ‘They need to be confident if they deposit material it will be held securely and not leak off online and destroy their businesses,’ says Gibby.
He also anticipates challenges regarding the many terabytes of data that will be archived. Still, he, and colleagues, are eager for change.
‘We are desperate for these regulations to come into place as soon as they possibly can,’ he asserts. ‘Without [them], the digital black-hole would certainly continue to grow... but with a legal framework in place, we will be able to archive the UK web-space on a much larger, more efficient scale.’