Data centres enable sharing and research recognition
Late last year, JISC and RIN launched a report on data centres. Kevin Ashley shares his view on the some of the challenges and opportunities for research
Data centres are helping to spread and reinforce a culture of openness and data sharing among researchers, according to the recent report, ‘Data centres: their use, value and impact’ by JISC and RIN. For researchers as both consumers and producers of data, data centres offer some compelling benefits, the report revealed.
Data centres provide ready access to large quantities of data that has some measure of quality assurance attached, and that is considered suitable for re-use. This data is generally easier to find, and easier to work with, than data that has been obtained from less formal sources. The report showed that researchers-as-consumers already widely appreciate these productivity and quality benefits.
Data centres also make it easier for researchers to gain recognition for their work via data sharing. Research has shown a positive correlation between data sharing and enhanced citation and impact of the data and the publications associated with it. In addition, data centres allow researchers to relinquish responsibility for preservation and access to older data, and so focus their attention on their current research interests.
Despite these advantages, there is evidence that researchers are less willing to deposit their own data in data centres than they are to access and use someone else’s material.
One barrier is that sharing data, no matter how it is done, has cost implications for the researcher, both in preparing it and in dealing with subsequent questions from re-users. Depositing in a data centre brings these costs to the fore, where they can be a disincentive. But in the longer term there is a saving, since the researcher does not have to deal with subsequent requests for re-use.
Data centres can relieve researchers from much of this burden, while delivering all the impacts. For example, they can make data easy to cite, for instance, as NERC data centres do by giving all their datasets a Digital Object Identifier (DOI). Data centres also make data more easily discoverable by other researchers. If data cannot be found, it is unlikely to be re-used.
Information professionals also have a role to play – for example, by collaborating to make data easy to find, particularly when it comes to cross-disciplinary uses. Although there are some exceptions, data centres today often serve one constituency well and others either badly or not at all and this situation needs to change.
Others in the field can help broaden discovery and accessibility of data in subject-specific data centres, and join this up with data held in institutions. They can help by not insisting on the use of inappropriate bibliographic standards for data description. They can help even more by easing integration between the management of the data about research and the data centre holdings. There is useful work already being undertaken in this area, but we need to accelerate its adoption.
There are variations in the amount of effort involved for a researcher placing data in a data centre. Some data centres make it easier than others, by doing much of the researchers’ work for them, although that makes the data centre more expensive.
The perceived effort needed is a significant barrier, because institutions and data centres have so far failed to make it clear how much is to be gained by using them. At the Digital Curation Centre, we are working with others to lower some of these barriers and to improve integration between subject data centres and institutions. We’re helping point out the benefits – in reputation, impact, and costs – that accrue to researchers, funders and institutions by using data centres.
We are also helping institutions and data centres to work together. For example, we help them adopt standard ways for data centres to feed information to universities about data that has come from those institutions. We are identifying tools that make movement of data between hosting regimes easier, such as submitting data from a cloud environment like Oxford’s ViDaaS directly to a data centre.
Data centres are developing rapidly, and will continue to do so. There are a number of things that must be done so that they can achieve their real potential to support researchers, and deliver full value to funders and the wider community. If they have greater consistency in how they expose their holdings, and work more closely with universities in sharing knowledge about what, exactly, they hold, that will be a big step forward. At the moment, most universities have no idea what, if any, of their material is held in a data centre.
Getting researchers, funders and employers to use data centres optimally will require us to understand more about what will achieve the maximum benefit, and why. But everyone needs to understand that data centres do far more than simply store and share data. They manage it, improve it through appropriate data curation, protect it and encourage its creative re-use – enabling future users to discover stories in the data that its creators never imagined.
Kevin Ashley is director of the Digital Curation Centre at the University of Edinburgh