Digital preservation matters

Share this on social media:

Vicky Reich, director of the LOCKSS programme at Stanford University Libraries, and Randy Kiefer, executive director of the CLOCKSS archive, explain why preserving digital content is a challenge that needs to be tackled, especially as this content becomes more dynamic

 
Vicky Reich (left); Randy Kiefer (right)

Why do preservation?

RK: From the days of cave paintings the format of content has kept changing. Digital is probably the most fragile format.

VR: If you don’t preserve digital content then it won’t exist. Most of society’s culture and commercial assets are now digital but, generally, the move from print to electronic is about access rather than preservation.

The benefit of the paper environment is that there are lots of copies, each held under separate administrations so it is very difficult for any one person to censor or accidentally destroy all paper copies. This is the model we replicate in LOCKSS. The LOCKSS technology means that each digital copy is held under a different administration. You need a minimum of six or seven copies. If people do not realise the strength in distributed holdings they risk their content being destroyed by a fire in the server or a ruthless dictator.

How did LOCKSS come about?

VR: LOCKSS came about when I was at HighWire in the 1990s. We realised that two things were happening. Firstly, publishers were putting things online that were not being printed. Secondly, libraries were having difficulty building collections. We wanted to find an easy way for libraries and publishers to interact as they did with print and the LOCKSS technology was built to do this.

You have to deal with preservation when handling digital content. Publishers were spending a lot of money to present their online content effectively and to maximise their reader’s interactions. We wanted to preserve that experience. In order to do that we needed a system that collected content from the web.

What’s the difference between LOCKSS and CLOCKSS?

RK: LOCKSS preserves all the digital resources that a member library holds. In contrast, although the ‘C’ in CLOCKSS stands for ‘controlled’ I say that it also stands for ‘complete’. We hold the complete holdings of publishers, put them under controls and distribute to 12 sites around the world. It is kept as a dark archive.

Trigger events can happen when, for example, the host changes platform and the publisher might decide that it is not financially viable to pay the costs of moving a struggling journal to the new platform. An appropriate way forward in such cases is to release that journal through CLOCKSS. We have had six trigger events in total.

Before we can release a journal we have to check that this is not impinging on anyone else’s revenue stream. For example, if a publisher goes bankrupt, but still has three years more of a contract with a third-party provider, we’d not trigger the publisher’s content until that contract was over. We don’t want to devalue any other models.

What are the challenges with preservation?

VR: The web as a publishing platform enables many things never envisaged in the print world. The web started with a document model, then evolved to include dynamic elements, such as advertisements and embedded videos. But first with AJAX and now with HTML5, the web is becoming a networked operating system inside the browser. It is no longer enough to parse content collected from the web to find the links and follow them; the content must be executed to discover the web resources from which it is composed. Some of these resources are web services, such as Google Maps. Preserving executable content and the services on which it depends is a major challenge that the LOCKSS programme is working to address.

RK: The biggest challenge is not just getting people to step up to their stewardship, but also getting people to understand how to play in that ecosystem. Getting publishers to understand is pretty straightforward. For libraries, the challenge is a bit more difficult because there are no interactive transactions like capturing content, just the building of long-term protection. Preservation is a very good thing and it is good when people participate in more than one initiative. It makes us all more relevant to the market. Different preservation initiatives come together and share ideas so that content will still be accessible in 50 years’ time when there is no internet and something has replaced it.

What is the situation today?

RK: I would speculate that about 40 per cent of libraries do preservation, but we are still in the shallow end of getting publishers on board. We’ve got the larger publishers – around 75 publishers to date – but many smaller ones need to join. Big publishers get a lot of negative attention, but the fact is that the largest publishers put large amounts of money into the system for projects such as DOIs, ORCID, standards and preservation. They have both a vested business interest and a long-term stewardship concern.

VR: LOCKSS works with 500 publishers, but this is barely scratching the surface. There is a misunderstanding that if publisher content is on a third-party platform that counts as preservation. The LOCKSS mascot is a tortoise, because these things take time.

Interview by Siân Harris