‘Scientists would rather share their toothbrush than their data!’ Few academics today would agree with Carole Goble’s purposeful misquote from 2006, especially as the internet, coupled with advanced computing facilities, has changed the speed and interfaces through which research is conducted making collaboration and sharing easier. In fact, science has always been a gradual collaboration of knowledge over time: witness Newton’s ‘if I have seen further it is by standing on the shoulders of giants’.
Researchers themselves have been quick to realise the potential of collaboration, genomics being one of the best-known areas. Figures from a JISC survey in July 2009 showed that 65% of academics in the humanities have collaborated with someone inside their department or outside their institution in the last five years. Websites like myExperiment provide a means of sharing workflows allowing, for example, information on sleeping sickness in African cattle to be repurposed for research into better treatment for people in intensive care.
Data sharing clearly works in those areas where it has gained community traction such as biomedical research. For example, JISC’s work in virtual research environments has given teams at the University of Oxford involved in cancer imaging a means of sharing images, information and algorithms to help advance cures for different forms of cancer.
Most research councils now encourage or mandate the sharing of data, stipulating that funding applications must include data management plans detailing measures that will be taken. Meanwhile journals like Nature require that supporting data sets be made available for verification. Universities and research centres are increasingly concerned to make sure that data are exploited to the full. This means putting into place effective systems for managing and sharing data, retaining data when staff leave and preserving it beyond the life of particular projects.
Funders are thereby encouraging a shift in culture. For example, JISC’s ‘Managing Research Data Programme’ is funding eight large pilot research data infrastructure projects in universities. Another strand of work, coordinated closely with the research councils, will produce case studies and model data management plans for wider adoption. The JISC-funded Digital Curation Centre has created a widely-acclaimed introductory digital curation course as well as a ‘data management plan checklist’ to help researchers when they are preparing funding bids to consider data issues from the project’s inception. The UK Data Archive, furthermore, has produced an excellent best practice guide to ‘Managing and Sharing Data’.
Of course, not all data can or should be shared. Issues of privacy, commercial potential and intellectual property rights all need to be taken into account. Fundamental characteristics of academic culture also need to be respected – to a point. Academic reputation is built upon publications. And publications are built upon data. Hence there is pressure on researchers not to share their data, at least until they have published, for fear of being pipped at the post. This latter deterrent to data sharing will remain until alternatives are developed to assess an individual’s suitability for progression up the research career ladder.
Researchers have an important stake in the data that are produced by their long hours conducting interviews, running simulations, or studying images from microscopes. This is highlighted by recent blog posts on Nature Network where researchers shared the very human concerns that someone else may find their data erroneous, or may use their data in a way that brings their work into disrepute. These misgivings are understood and many research councils mitigate their data-sharing policies by allowing embargo periods to assuage researchers’ concerns.
But there are other hurdles to be overcome, closely associated with our publication-driven academic culture. Preparing data for sharing and reuse can represent a considerable overhead. Why, many researchers will ask, should I provide the context and screen upon screen of metadata that I may never reuse again? I want to publish the article and move on.
However, if research data are to be given the respect and care they deserve, the process of managing data needs to be streamlined, and it may be that more fundamental practices need to change. In a world of data-driven science, the peer-reviewed publication of data may become the norm. Journal publishers and universities are already beginning to think about how scientific journals might adapt to reflect and lead the changing culture.
If we are serious, as we must be, about the value of data, we must think about how well mechanisms for academic recognition reflect the work that is put into creating and preparing high-quality data. This necessarily begs the question of to what extent the excellence of shared and published data should be recognised in evaluation processes such as the UK’s Research Excellence Framework.