Preserving global scholarship
As preservation professionals meet the rising demands of traditional publishers, can they also attend to the needs of riskier ventures, asks Rebecca Pool
As electronic publishing surges ahead, preservation initiatives are experiencing very strong growth. Global archive, CLOCKSS, for example, has yet again added more than fifty publishers and doubled the number of libraries to its roll call while late last year Portico celebrated preserving a hefty 25 million journal articles.
And the rise is content agnostic. Despite its e-journal milestone, Portico managing director, Kate Wittenberg, claims equally healthy growth from e-books.
‘We are still seeing a continuing large-scale flow of e-books as publishers and libraries increasingly accept the fact that books are becoming digital,’ she says.
Likewise, Randy S Kiefer, executive director of CLOCKSS, confirms a steady increase in demand for e-journal and e-book preservation. ‘When CLOCKSS started it was focused on academic journals but we also have very strong growth [in e-books].’
Kiefer describes the not-for-profit organisation’s e-book preservation services as ‘plain vanilla’. CLOCKSS currently captures and preserves standard listings because more dynamic content, such as online commentary to editor’s notes, either doesn’t exist or isn’t yet digitised. But this will change. ‘The e-books business is evolving,’ he says. ‘E-books are not yet geared to capture dynamic content and I think it’s going to be another two to three years before the style and type of business settles down.’
In the meantime, preserving dynamic content is still throwing up issues. Kiefer notes that dynamic content is difficult for dark archives such as CLOCKSS to preserve as, put simply, dynamic content changes from moment to moment.
‘Most [dark archives] take snapshots over time with the goal of preserving the wide majority of content in the way it was presented,’ he says. ‘We’re getting better at taking good snapshots and getting a sense of when is the optimal time to capture that content.’
Crucially, new tools are also being developed to preserve dynamic content better. For example, LOCKSS, an open-source, digital preservation initiative based at Stanford University and used by CLOCKSS, has released open-source software to capture content that was previously locked behind inaccessible forms. What’s more, the initiative is also working on code to collect materials delivered via Javascript.
At the same time, Portico continues to research new methods to connect data to publications. However, Wittenberg highlights how the organisation still needs publishers to provide new kinds of dynamic content to work with.
The organisation has access to complex content, such as audio-visual materials from its digitised historical collections preservation service. But as the Portico managing director adds: ‘I haven’t seen anyone come to us with a giant dataset and say “I need to work out how to preserve large executable files”. We assume this is coming, but we haven’t had to deal with it yet.’
Preservation developments
As each preservation organisation continues to grapple with dynamic content challenges, it is also eyeing new developments. One key example is the slow but steady demand for preservation from developing nations.
CLOCKSS, for example, recently signed a contract with the University of Guam, located on the island of Guam in the Western Pacific Ocean, to preserve its e-journal, Micronesica. The organisation also has agreements with Brazilian government education agency, CAPES; the Brazilian scientific journals e-library, SciELO; the Autonomous University of Mexico; and is making in-roads into India.
‘Publishers in India are very print centric with business models at least a decade behind the USA and Europe,’ says Kiefer. ‘But we’re close to signing a large publisher and have already signed an open-access publisher here. And I’ve also signed a couple of libraries; as this group expands, more will join.’
But despite CLOCKSS’ success, preservation isn’t easy for the libraries and publishers of developing nations. Issues over stable publishing systems and infrastructure hamper uptake, and of course resources including cash, or a lack of it, can cripple preservation plans.
Susan Murray, managing director of African Journals Online – a not-for-profit online service holding collections of peer-reviewed, African-published scholarly journals – describes a publishing set-up very different from what many communities will be used to.
‘Many African journals are published individually by a few dedicated volunteers working in severely resource-constrained areas and institutions,’ she says. ‘Challenges can be as straightforward as intermittent electricity provision, under-developed domestic banking infrastructure, unreliable telephone connectivity and lack of access to the internet and computers.’
Constraints can also be related to insufficient manuscript submissions, an author’s proficiency in a journal’s language and African universities incentivising researchers to publish in more prestigious overseas titles. AJOL itself offers free online hosting for qualifying journals, and despite myriad challenges, papers published in African journals are regularly noted and cited.
‘In 2012 there were more than 13 million downloads of papers hosted on AJOL with over one million repeat users of the website from around the world,’ says Murray. ‘More than 40 per cent of repeat users of the AJOL website are from Africa and a growing proportion of users come from other developing countries around the world.’
Importantly, Murray is adamant that publishers and libraries in developing nations are very serious about preservation but participation depends on resources, journal context as well as the ICT skills and proficiency of the editorial board and office. Indeed, at least in Africa, many less well-resourced journals rely on AJOL to preserve, and continue to host, content if a journal ceases publication.
‘We regularly update and securely preserve several copies and dated versions of the full database and partner journals’ content at our offices, offsite in two South African cities and also offshore,’ she says. ‘In fact, we’ve had to send a couple of publishers a complete [journal] back file where, for example, servers have crashed.’
‘But ironically,’ she adds, ‘organisations in resource-constrained settings have to stretch their means to continue hard-copy provision and preservation, given the digital divide, as well as find ways to assure digital preservation.’
The success of open access
Developing nations aside, preservation organisations are also seeing a rising demand for services from established as well as up and coming open-access (OA) publishers. One clear example is PeerJ, which was launched in June 2012 by former PLOS ONE publisher, Peter Binfield, and former Mendeley chief scientist, Jason Hoyt.
Operating on an innovative business model – ‘pay once, publish for life’ – the publisher charges a single, low fee that allows researchers to then publish biological and medical sciences articles for free. The business model is brave but appears to be working, and as Binfield asserts, PeerJ, like other professsional OA publishers, takes preservation very seriously.
‘The serious open-access publishers that are emerging are professional, well-run operations,’ he says. ‘For example, there’s PLOS, Biomed Central, Hindawi, ourselves and we all have good industry standard preservation strategies.’ PeerJ, for one, deposits its pre-print server and journal content in both CLOCKSS and LOCKSS repositories with article text also archived in the open-access repository PubMed Central. ‘We’d also like to be in The Royal Dutch Library when they start adding content again,’ Binfield adds.
But why CLOCKSS and LOCKSS? Binfield points to cost, but CLOCKSS’s Kiefer firmly believes his organisation has an edge over alternative preservation services for OA publishers. As he explains, if his organisation triggers content, then that released content is also OA and has the same availability as when it was first published.
‘We might use a slightly different Creative Commons licence but the fact of the matter is, if we are triggering content, then the publisher has gone and what we’re saying is this cannot be used in a commercial venture,’ he adds.
Formal preservation aside, PeerJ’s Binfield also highlights an archiving concept that he believes is unique to OA journals and is largely ignored by publishing communities. ‘Our content is now sitting on the hard drives and thumb drives of millions of people around the world,’ he says. ‘This distributed back-up may not be a formal archive but we encourage this in our licence whereas a subscription publisher actively tries to prevent that kind of thing.’
But can a distributed informal back-up provide a safe alternative to formal preservation? Judging by the preservation strategies of PeerJ and other OA publishers, the answer is no, but in Binfield’s words, ‘it’s another route to resilience’.
‘This gives us much more of a distributed resilient archive of content than if we were a subscription publisher that really restricts the number of copes that exist in the world,’ he says. ‘It’s not failsafe but if the entire system of CLOCKSS or LOCKSS went down, it is a natural advantage of being an OA publisher.’
Preservation’s final frontier?
Be it formal or informal, the OA publishing front-runners clearly have preservation in hand. But what about the much smaller OA publishers that are publishing, at most, a handful of relatively-niche titles? Without a doubt, right now, these are struggling with preservation services.
Victoria Reich, executive director of the LOCKSS program at Stanford University Library, highlights how almost every day a publisher, requesting content preservation, contacts the programme.
Reich reckons the majority of these requests are from India- and China-based OA publishers that don’t always have enough published articles for quality to be evaluated. At the same time, the publication’s editorial board is often too narrowly represented to be of interest to the majority of LOCKSS participating libraries.
And perhaps worse, some publishers are ‘black-listed’ on a website set-up by academic librarian, Jeffrey Beall of Auraria Library, University of Colorado Denver, USA. This aims to warn researchers of possible dubious practices from ‘predatory’ OA publishers. But as Riech highlights: ‘This, of course, makes librarians understandably skittish about spending precious resources to preserve this material.’
PeerJ’s Binfield concurs that a large group of ‘less professionally organised open-access venues’ exist, that likely have preservation issues. ‘These could be small journals run by a professor out of his or her office,’ he says. ‘You know it’s an open-access journal but do they really know what they are doing on the preservation side? So, yes, there is a chunk of content that is being published that isn’t very well preserved.’
A key issue is the fact that smaller, OA publishers won’t have the resources to set up, say, a CLOCKSS or Portico repository, which demands an initial fee to cover fixed costs such as software expenses. But help is at hand.
‘Recovering this initial fee from any smaller publisher is quite challenging,’ says Kiefer of CLOCKSS. ‘But our board is now looking at avenues to find grant funding to underwrite this initial step for more of these open-access and small publishers.’ And as Binfield also highlights, other groups have emerged to ease preservation pressures for smaller publishers. For example, the Directory of Open Access Journals, DOAJ, currently hosts nearly 10,000 open-access journals, of which Binfield reckons the vast majority will be smaller publications.
Last September, librarians from 11 Canadian university institutions joined forces to preserve Canadian electronic government information under threat from archive budget cuts. Forming the Canadian Government Information Private LOCKSS Network – CGI-PLN – the group has established a geographically distributed infrastructure to securely preserve government information and help ensure access to digital content in the future.
For example, the staff for digitisation at Libraries and Archives Canada was recently reduced by some 50 per cent. The likely end result is that new publications will be published online, but older information in hard copy is less likely to become available.
With this in mind, the CGI-PLN’s focus is to protect information that has been publicly disseminated by the Government of Canada. As James Jacobs, LOCKSS-USDOCS coordinator, Stanford University highlights: ‘It’s heartening to see Canadian libraries collaborating on such a critical mission. Future Canadians will laud the forward-thinking work of these librarians. Lots of copies do indeed keep Canadian documents safe.’