Can better data lead to better science?

Share this on social media:

Amy Bourke-Waite and Iain Hrynaszkiewicz report from Springer Nature's second Publishing Better Science through Better Data conference

Can better data lead to better science and faster, more reliable discoveries?

Many high-level discussions have taken place over the last year between funders, institutions and publishers, around best practice in data management. There are now 30 or more funder policies that require open data, a major frontier in open science, which proponents say will help advance the process of scientific discovery. But what are the practical implications for researchers?

This was the subject of the second Publishing Better Science through Better Data conference, based at the offices of Springer Nature in London. The full-day conference included advice on publishing, advancing careers and discussion of emerging tools and resources available to researchers to help them, and society, derive maximum benefit from scientific research. Speakers included representatives from leading journals, research organisations, funding agencies and technology providers.

Jeremy Frey, professor of physical chemistry at the University of Southampton, opened the conference by positioning online management and sharing of research data as a natural extension of the internet. Frey reminded us that sharing data and detailed research methods shouldn’t be an alien concept; after all, we’re taught from an early age that it’s more important to 'show your working' than get the right answer.

Sadly, all too often researchers rely on direct requests for data sharing, but this is unreliable and unsustainable. Andrew Hufton, managing editor for Scientific Data, warned that data is disappearing at a rate of about 17 per cent per year (Vines et al, 2014). Data too often remains unreferenced and invisible, and that appropriate credit is rarely given to those who do not publish their data.

Other speakers touched on the point that, historically, the system of science publication has asked everyone, from authors to peer reviewers and editors, to work on trust alone. Jeorg Heber, executive editor of Nature Communications, reminded us that Watson and Crick’s 1952 paper on the structure of DNA contained no data whatsoever, but this simply wouldn’t be enough anymore. Increasingly, even sharing your data is not enough. The narrative of how the data was collected is becoming more and more important (particularly so in patent applications, for example).

Matt Sydes, senior scientist and senior medical statistician at UCL, noted that although sharing of medical data has many challenges, the debate has switched from whether medical data can be shared to how it should be shared. Sydes made the point that although some promote distorting data to protect patients’ identities, this leads to complications in itself, weakening datasets, and is also potentially problematic under the Data Protection Act. His work has focused on developing a controlled access approach to data sharing as a more pragmatic way to increase reuse of personal data from clinical trials.

There is also increased attention on transparency of clinical trials, as Rufus Pollock from Open Knowledge noted, in presenting plans for OpenTrials, a database of all data and documents, on all trials.

Other tools presented by researchers, in the popular lightning talks section of the event, included Active Data Biology – from Sam Payne, Pacific Northwest National Laboratory. This tool enables analysis and sharing of data with collaborators and the public, and creates backed-up versions in an online repository.

Frey and other speakers’ experience has taught them that the students and post-docs who kept well managed and documented data saved themselves both time and pain in the long run. A familiar anecdote was how hard it can be for a final year PhD student to write up their thesis using data collected in their first year.

As well as tools to help with these problems, delegates also gained insights into examples of how to optimise data presentation and archiving to ease the process of publishing work in journals in the Nature Publishing Group portfolio, from Hufton and Heber.

For early career researchers, embedding good practice in data organisation and presentation at the start of a research project can have benefits in terms of efficiency of getting published and greater reuse of their work.

A powerful example from Heber was that open data is playing an increasingly important role in the understanding of global public health emergencies as they emerge, rather than waiting for publication of results. As important as data can be for a researcher’s career and publications, we mustn’t forget that open, properly archived data might have the potential to benefit science well beyond a generation.

Iain Hrynaszkiewicz, is head of data and HSS Publishing in the Open Research Group at Springer Nature; Amy Bourke-Waite is senior corporate communications manager at Springer Nature.