Keeping up with digital preservation

Share this on social media:

Paul Stokes

Deciding to do nothing about preservation could be a disaster, says Paul Stokes

Introducing digital preservation to an organisation is not a task for the faint-hearted. 

There’s data to be found, people to convince, policies to be written… and that’s before a single system has been procured or a single byte preserved. However, there is no time like the present and this is the ideal time to make a start. 

Delaying is not really an option because of the alarming hike in the amount of data that is being created.  According to the World Economic Forum, an astonishing 90 per cent of the world's data has been generated in the last two years alone. It says that 2.5 quintillion bytes of data are produced by humans every day and 463 exabytes of data will be generated each day by humans by of 2025 – that’s the equivalent of 212,765,957 DVDs per day!

Sometimes a backup is not enough

Failure to preserve data properly can pose a significant reputational risk and could result in the loss of unique and irretrievable knowledge.  as the server crash in 2016 at the Memorial University in Canada shows.

In July 2016, staff at the Queen Elizabeth II library at Memorial University were undertaking routine maintenance that required power to the building to be cut and switched to a backup system, which failed. The back-up to the back-up power (big batteries) came online and lasted about 40 minutes, which wasn’t long enough. More than 70 terabytes of data was lost. 

Luckily, physical documents and objects still existed – but it all had to be digitised again.  

Rescuing the bronze age in York

Failing to adapt to rapid change of systems and technology is another risk to consider when preserving data - something that York University understands only too well. 

It’s often put about (in archaeological circles at least) that archaeologists destroy their primary evidence as they discover and catalogue it. There’s no going back for a second bite of the cherry. After archaeologists had finished work on almost 180 sites in north east London, all that remained were the archives stored in the vaults of local museums. Those archives included data from a large number of unpublished excavations, with very impressive Bronze Age material discovered on the banks of the Thames.

But when the project finished, the archaeologists discovered to their horror that their irreplaceable data was running on obsolete technology using outmoded software and file formats.  Some of their magnetic media was also corrupted. Luckily, a team of specialists managed to retrieve most of it.

Getting started with preservation can be a daunting thing but to ensure access to digital materials is maintained in the long run, it’s important to ensure all systems are equipped to keep up with technology and organisational change. 

One means of automatically keeping systems aligned and ‘speaking to each other’ is to use clever tools, such as Jisc’s Preservation. This tool automatically reformats files, so they are readable with new and yet-to-be-invented software. Once in the Preservation system, the files are automatically ‘recognised’ and processed according to pre-set rules into an appropriate format that is as future-proof as possible.

Nothing human is alien 

However, no matter how cleverly technology is deployed, there’s no absolute defence against human error, which remains one of the major risks to digital content. After hardware failure, the most common cause of data loss is user mistakes (at least it was in 2003, 2009 and 2015). 

It is not uncommon that users will unintentionally move files or delete content inadvertently. Strict user policies that separate “archives directories” from “working directories” where users can still edit and actively work with content, can protect against this risk. 

So what to do now?

Preservation is about identifying and managing risk. There are a number of useful questions to answer to help with that: has a data asset survey been completed? Who is generating data and who uses it, where it’s stored, what it’s worth? Finally, put policies in place to manage the preservation process. 

Paul Stokes is senior co-design manager at Jisc