Publishing better science through better data

Iain Hrynaszkiewicz, head of data and HSS publishing in the Open Research team, and Amy Bourke, corporate communications manager at Nature Publishing Group report on the first research data conference for early career researchers hosted by Scientific Data and Nature Publishing Group.

Data are the foundation of science and research. So if you’re a researcher, you’re always – and have always been – interested in data.

Researchers generate, collect, analyse and write papers about data. In the last decade there has been progressively more top-down interest in data – from institutions, research funders, and publishers, leading to new polices and mandates that affect researchers. But what do these policies – for data archiving, sharing or for data management plans – mean, practically, for researchers in the lab, field or clinic?

All too often, early career researchers (ECRs) receive insufficient guidance about how research data management (whether good or bad) can influence their ability to publish and their ability to receive funding – some of the things that matter most to researchers’ career progression. On 14 November the Open Research team at Nature Publishing Group and Scientific Data (a new open-access publication for descriptions of scientifically valuable datasets) held its first conference to look at these policies more from the bottom up.

The half-day event, ‘Publishing Better Science through Better Data’, at our London Campus, aimed to offer practical, educational information for ECRs on issues related to data. Videos and slides from all of the presentations are freely available online. Around a third of the attendees were post docs, and two thirds were PhD students. Around half of the 80 attendees were life scientists; about a quarter physical scientists; and about 10 per cent were social scientists (the conference also aimed to facilitate interdisciplinary knowledge sharing).

Philip Campbell, editor-in-chief of Nature, opened the conference with a call for the audience to recognise that access to understandable and reusable research data is part of ensuring credibility and reproducibility in science. Nature-branded journals ask for the data associated with published papers to be made available for this reason. He went so far as to say that if we want to combat fraud and stop ‘sloppy science’, open data should be part of the solution. Campbell announced at the conference that Nature journals will be collaborating more closely with Scientific Data and enhancing their data access policies to increase reproducibility in published research (see the editorial in Nature).  

David Carr of The Wellcome Trust took the audience through research funders’ expectations, noting the growing consensus that data underlying published findings should be open wherever possible. The Wellcome Trust policy asks recipients of their grants to provide a data management and sharing plan, and he offered seven points researchers should address when drafting a plan. Carr stressed that funders are committed to working with the community to maximise the value of research data, and therefore the requirements funders place on researchers to demonstrate good practice are only likely to increase in volume, and will be more strictly policed in the future. He concluded that it may not be easy; but it’s certainly worth it.

Librarians’ and institutions’ perspectives were provided by Sally Rumsey of The Bodleian Libraries at the University of Oxford. Rumsey explained that in many cases, institutions’ data policies are driven by funders, especially the EPSRC’s policy which ‘has made universities sit up and take note,’ (a significant marker that puts the onus on the institution rather than the researcher). The University of Oxford is in the process of creating its own data archiving solution as a result.

Rumsey provided 10 tips on research data management that included: the importance of backing up data; identifying the resources and people who can provide support; and knowing your institution’s and funder’s policy. The audience were encouraged to ask their 'friendly subject librarian' for help with research data management, funding applications, writing data management plans and archiving. Veerle Van Den Eyden later lent a perspective from the quantitative social sciences, reminding the audience that yes, even sensitive/confidential data can be published, if done in the right way.

Mark Hahnel founded his start-up, Figshare, after he experienced immense frustration at not being able to find a good solution for storing his data – videos produced carrying out stem cell research. He described the increasing focus on data by governments, funders and institutions as a tidal wave which ECRs could no longer afford to ignore. Hahnel emphasised his view that, without a doubt, mandated data policies were coming, and these would be monitored to strictly ensure compliance.

The managing editor of Scientific Data, Andrew Hufton, described the rise of the data journal as a way for researchers to get credit for their data. He explained how the data descriptor, Scientific Data’s primary article type, is designed to make data discoverable, accessible, intelligible, reusable and citable. He also advised the audience on how to choose the right repository for data. Susanna-Assunta Sansone, honorary academic editor of Scientific Data and associate director of the Oxford e-Research Centre then talked about the importance of the – arguably undervalued – skill of data curation, to promote reusability and discoverability and future collaboration.

Finally, Monica Contestabile offered some fascinating real-world case studies of what not to do when submitting a paper to Nature Climate Change, demonstrating that peer reviewers may advise editors to reject papers with substandard data sampling and collection procedures. The papers that were rejected could broadly be sorted into three categories: low data quality; lack of transparency; and lack of accessibility.

The final message for researchers was to check which policies – of funders, institutions and your preferred journals – might apply to your research. Data should be an asset to researchers seeking to be published or funded, not a hindrance.

This was the first conference of this type held at Nature Publishing Group, with a lot of topics and speakers in a relatively short event. This was partly to test which topics were most relevant and important to ECRs, and we surveyed delegates after the conference. Eighty percent of respondents to our survey would recommend the event to a colleague; all respondents thought the talks were relevant; and 70 per cent rated the quality of presentations as good or very good.

Given the popularity of the event (tickets sold out in less than a day) and the feedback received we will organise similar events in the future and there are some ways we can improve them, based on survey feedback. This includes allowing more time for discussion and ice-breaking; including more funders as speakers; and including some presentations from researchers. Our major goal was to engage real researchers and trainees with this conference, and we’d welcome proposals for presentations from researchers to speak at – as well as attend –  future events.

You can find all the presentations from the conference, along with videos of the speakers, stored in Figshare online here.