Should we be prepared to face a future without digital curation?
A new digital curation centre in the UK will help research institutions to safeguard research data for years to come. Peter Burnhill, the centre's interim director, reports
It's 2020. A postgraduate student has chosen 'Icons and Irony in late 20th Century Science' as the topic for her thesis. She expects to find a wealth of information for this topic because her joint Informatics/History degree taught her that this was the emergence of the digital age and that late 20th Century records are rich in written documents, recorded interviews, and information in the form of documentary films and sound archives. Not only had a deluge of research data been generated in the physical sciences by the turn of the century, but large sums of money had been spent on digitising cultural material.
Searching the web for evidence to support her choice of topic, however, leads to a frustrating conclusion. Although much data can be accessed from afar, it soon becomes painfully obvious that a significant proportion of her early finds are unreadable. They are hopelessly dependent upon the formats peculiar to software, graphics, and word processing, for example. Her supervisor cautions that this thesis topic will result in her spending most of her time writing code...
The importance of data
Scientists and researchers generate increasing amounts of digital data, and investment is being made into further digitisation and purchase of digital content and information. This trend means that such scientific records and documents are at risk from technology obsolescence and from the fragility of digital media.
The UK's Joint Information Systems Committee (JISC) and the academic community have taken note of this issue and invested in a number of scoping studies to find a solution.
Now, building on that work and the expertise already existing in particular disciplines, a Digital Curation Centre (DCC) is being launched. The DCC's task is to support UK institutions in storing, managing, and preserving data to ensure its enhancement and continuing long-term use.
Data forms the evidential base for scholarly conclusions, and for the validation of those conclusions. A basic tenet of this is reproducibility.
Digital curation is a new phrase that includes data archiving and digital preservation but also goes beyond that to include the active management and appraisal of data over the life-cycle of scholarly and scientific interest. This adds value through the provision of context and linkage, placing emphasis on 'publishing' data in ways that ease reproducibility and re-use, with implications for metadata and interoperability.
The overriding purpose of the new centre is to enable a continuing improvement in the quality of data curation and digital preservation.
The DCC will not itself be a digital repository, nor will it attempt to impose the policies and practices of one branch of scholarship upon another. However, there are some unifying themes across disciplines, such as attention to provenance and 'data as evidence'. The DCC hopes to provide the platform for collaboration.
As part of this, the centre will work with other groups to establish a research programme that will address the wider issues of digital curation. It also plans to foster links across existing communities, through an associates' network and engagement with individuals and organisations who act as curators. In addition, the DCC intends to develop services to evaluate tools, methods, standards, and policies, acting as a repository of tools and technical information. These plans should help to achieve a 'virtuous circle', whereby expertise, experience and requirements feed into the DCC research programme.
Research will be the core element of the DCC's activities and this is organised so as to achieve four main goals. Firstly, it aims to draw together the various functions of 'curation', from the traditional archival functions to the maintenance and publication of evolving knowledge as seen in scientific databases. The second area for research is to identify, through direct research collaboration and through interaction with the service arm of the DCC, the key projects on which research is needed. It also aims to conduct research in areas that the partners have already identified as crucial to digital curation. The fourth research priority is to institute two-way conduits between research and services, in which practical issues can be drawn to the attention of researchers and the products of research can be tested in practice.
Annotation, data integration and publication, and appraisal and long-term preservation, are some of the research areas that have already been identified by DCC and its partners. They will also look at socio-economic and legal context: rights, responsibilities, and viability, and performance and optimisation. Beyond the topics already identified, there is plenty of scope for more research too.
Further topics for consideration include evolution of structure, ontologies, context and emulation, and data registries. And, although the initial focus is on research data, the policy intention is also to address the preservation needs of e-learning and scholarly communication.
The Digital Curation Centre is currently in a start-up phase to prepare for its official launch in October 2004. The website is in place and the helpdesk is taking messages (see box below).
Over the coming months it plans to deliver a web portal and an e-journal. There will also be an advisory service and programmes of professional development, and of standards-based development of registries, testbeds, and tools.
The centre is also ensuring that it has the appropriate staff on hand to help achieve its goal. Key to this is the search for a world-class director, which is now underway. This person should be in place by the time of the launch event, which will be held at the National e-Science Centre in Edinburgh.
Digital Curation Centre at a glance
- Level of investment: around £1.3m a year
- Managed by: JISC and the e-Science Core Programme
- Institutions involved: Universities of Edinburgh and Glasgow, which together host the National e-Science Centre, UKOLN at the University of Bath, and CCLRC, which manages the Rutherford Appleton and Daresbury Laboratories
- Formal launch: October 2004
- Website: www.dcc.ac.uk