“Quarter of scholarly record may be at risk” – report

Share this on social media:

Martin Eve

Preservation organisations have reacted with alarm to a recent analysis that suggests around a quarter of academic publications are not being preserved for the future.

Martin Eve (left) of CrossRef assessed around 7.5 million ebooks and articles for which the organisation provides a fixed identifier, or Digital Object Identifier (DOI). For around two million articles in his study, he could find no evidence that the articles were being preserved.

Digital Object Identifiers prevent ‘link rot’ so that, even if a URL changes, an article or book can still be cited and retrieved. But DOIs are not enough on their own because they only preserve the link – not the destination. So, unless there’s a preservation service guaranteeing the content, the DOI will stop working if a publisher goes out of business or simply removes a title.  

Librarians have been urged to play what is an important role to in encouraging and requiring proper preservation. A toolkit to support librarians, including model license language, is available here: https://liblicense.crl.edu/resources/digital-preservation/ 

Alicia Wise, Executive Director of CLOCKKS, said: “This is a wake-up call. Agencies like CLOCKSS and libraries like the British Library have a very advanced understanding of how to preserve content and have made amazing progress with one of the major challenges of our generation. But we urgently need to accelerate the preservation of our intellectual heritage content if we want to secure the huge percentage of scholarship that remains unprotected.”

William Kilbride, of the Digital Preservation Coalition, welcomed the report, adding: ‘Martin’s findings are incredibly important. Publishers and libraries have been at the leading edge of digital preservation. We’ve been arguing for years for urgent investment to ensure research remains viable against the fluctuating fortunes of the publishing industry. It’s pleasing to see progress, but telling how much more there is to do.’ 

In November 2023 the DPC’s ‘Bit List’ classified research papers among ‘vulnerable’ content types, meaning that the application of ‘proven tools and techniques’ is required to improve the likelihood of preservation. Kildbride added: “What happens in sectors that haven’t invested in preservation the way libraries and publishers have? Martin’s report is significant for publishers that deal with well-known data types in a well-developed sector. It hints to the crisis in other sectors that have not been so proactive in the preservation of digital content.”

Eve was keen to stress that there was at least some good news – that some 4.3 million of the works he studied were preserved in at least in one place: “That's not utterly terrible, and this under-counts preservation, because we haven’t got data from every archive everywhere. I’m also not looking at green archives, although there’s still debate about whether such platforms can constitute adequate preservation. It is true that simple hosting in an institutional repository is not the same as triplicate redundancy preservation in dark archives.”

Read a more detailed report on Eve’s discoveries here: https://clockss.org/martin-eve-crossref-the-digital-preservation-of-7-5-million-items/

Related news