Dare to share?
From clarity to confusion, Rebecca Pool looks at the highs and lows of data sharing in scholarly publishing
In September Springer Nature announced that all research papers accepted for publication in Nature and its additional titles must include a data-availability statement, detailing whether and how underlying data is accessible.
At a minimum, authors should confirm relevant data is available on request, but can also provide detail on publicly archived datasets, and even cite datasets assigned a digital object identifier.
The move follows a trial across five Nature journals, Cell Biology, Communications, Geoscience, Neuroscience and Physics. And as Andrea Taroni, chief editor of Nature Physics, puts it: ‘You might say it’s actually long overdue’.
‘It’s now clear that research is increasingly relying on very, very large datasets, and in Nature Physics we’re seeing a disconnect between the narrative contained in an article and the sheer amount of data, analysis and interpretation required to understand the paper,’ he says. ‘So I see this as a small step to encouraging scientists to present data, as part of an overall package of scientific information that also includes the paper.’
The Nature titles are hardly alone in this pursuit of data sharing; Springer Nature has delivered four different data sharing policies that its vast fleet of journals can pick and choose from. An initial light-touch policy encourages data deposition in a repository, while the fourth, more demanding policy requires data deposition for every paper, as well as peer reviewer access to data.
In the wider world of publishing, much change has also been afoot. For example, Wiley ensures data relevant to a paper from its journals can be made accessible to readers via archiving in a Figshare repository. Meanwhile, Elsevier’s authors can store and share research data as part of the publisher’s online article submission system, as well as upload data to Mendeley Data and link this to an article on Science Direct.
Iain Hrynaszkiewicz, head of data publishing, open research at Springer Nature, is spearheading the organisation’s data sharing developments. As he highlights, in the few months that have followed the recent policies, more than 500 journals have adopted a standard policy, including the Nature and BioMedCentral titles.
But with more than 2,000 journals, as well as other publications in tow, it’s clear data sharing at the company is a long-term project. ‘We realise we can’t try and solve every issue for every type of journal and data at once,’ says Hrynaszkiewicz. ‘For example, society-owned journals may need different communication strategies and timescales to discuss policy changes, and we will be adapting our approaches for books and proceedings in due course.’
But so far, the rollout of data sharing at Springer Nature is working. Hrynaszkiewicz reckons authors appreciate what he describes as ‘our pragmatic and practical approach’. And as he adds: ‘We don’t assume one size will fit all in a particular discipline. Where editors wish to introduce stronger policies, we provide tools and support for this.’
Likewise, Taroni notes how his own Nature Physics authors are adopting the policies with relative ease. ‘I think [researchers] started off thinking “OK, [the journal] is paying lip-service to data availability, let’s do the bare minimum and include a statement saying we’ll make the data available if somebody asks”,’ he says.
‘But now we are receiving papers with data availability statements already written,’ he adds. ‘It’s as if authors are saying “I want to make sure I write a good statement, as I don’t want to look like I’m hiding something”.’
Certainly, researchers’ propensity for sharing data is reflected in Figshare’s latest survey. Working with parent company Digital Science, the organisation recently released the results of a global survey of 2,000 researchers on data sharing, alongside a report, The State of Open Data.
Analysis revealed that a hefty three quarters of respondents had already made their research data openly available at some point. At the same time, a similar number were aware of data-sets that are open to access, reuse, repurpose and redistribute.
Like Hrynaszkiewicz and Taroni, Daniel Hook, chief executive of Digital Science, is confident that data sharing is gathering momentum. As he puts it: ‘Our report is very heartening, as such large numbers of people are engaging and actively sharing at least some of their data.’
According to the survey, awareness of open data transcends age and career progression. But one of the report’s more eye-opening results indicates that nearly 70 per cent of researchers value data citation as much as an article citation, with a further 10 per cent actually valuing the data citation more. For Hook, the fact that researchers are now accepting citations between data, datasets, and databases is a ‘big deal’, with huge ramifications for scholarly publishing.
‘We spend millions and millions of dollars globally on research, and [a part of] this is just reproducing someone else’s negative results that they hadn’t shared because they had nowhere to publish it,’ he says. ‘The idea that we don’t have to completely reproduce every negative result that other researchers have thrown away offers massive potential for research to be more efficient.’
What’s more, he is also certain that the rise of data sharing also provides researchers with a chance to be seen as ‘trail-blazers’ in the up and coming field of open data.
‘The next step should be for a university to award a professorship to a researcher that’s produced brilliant work with data,’ he says. ‘In the next couple of years, someone will receive a professorship, not because they had an idea, wrote a paper or performed an experiment, but because he or she collected and shared a large amount of data in a way that had an impact on research.’
‘It really is only a matter of time before having a highly-cited dataset is as important in some fields as a paper in Nature, Science or Cell,’ he adds.
Professor Brian Nosek from the University of Virginia and executive director of the Center for Open Science is a driving force behind data sharing and open science. He agrees with Hook and believes it’s high time researchers received recognition for their data.
‘There is this ongoing frustration for researchers that they only get credit for their final paper,’ he says. ‘But really their great work may have been in the data they generated, the methodology used, or the code that was written to use that data.’
‘Changes we are seeing right now are going to move the credit system towards how researchers formulate science, which is where the real scholarship can be,’ he adds.
Reproducibility crisis
While Nosek is passionate about data sharing, he also believes science faces a key challenge right now. Scientific success for any grant-seeking researcher depends on being published rather than – as he puts it – ‘being right’. ‘Given this incentive structure and publication being the currency of science, I am confronted with choices,’ he says. ‘Do I make my research more publishable and less accurate, or do I make it less publishable but more accurate? Now that is a conflict of interests.’
Realising this problem required ‘broad intervention’, he set up the Center for Open Science in January 2013 with colleague Jeffrey Spies. The Center first looked at issues around every researcher’s bugbear – reproducibility.
In August 2015, a team of 270 researchers, led by Nosek, unveiled the results of a study to replicate the findings of 100 psychological experiments published in respected journals in 2008. In Science, the researchers revealed how they had found that only 36 out of 100 of the replications showed statistically significant results, compared to 97 out of the 100 original experiments.
Controversy, criticism, and even what has been termed ‘a crisis around replication’ ensued. But for Nosek, the research was a success as it increased awareness of replication challenges, and for him, also illustrated the key role data sharing has in reproducibility.
‘If I can access data from the original study, then I understand more easily how the researchers made their decisions, how they got to their inferences and at least reproduce the evidence in those original studies, prior to even trying to recreate [that study],’ he says.
But the organisation isn’t just about reproducibility research. It has launched ‘Open Practice badges’ to acknowledge open practices and produced TOP – Transparency and Openess Promotion – standards, embraced by more than 500 journals, including Science.
Meanwhile it also provides Register Reports, in which peer review for participating journals is conducted prior to data collection and analysis, to encourage transparency across the research lifecycle and remove bias against negative results. And a preregistration challenge offers 1,000 researchers the opportunity to win $1000 each for publishing results of preregistered research.
But perhaps most pertinent to data sharing, the organisation has established an open source software project, the Open Science Framework, to promote open collaboration by connecting data repositories.
Participants include Mendeley, Figshare, GitHub and more. And as Nosek says: ‘Our real hope is that in the next year or two, many more repositories will connect to our framework to help researchers be more efficient in data sharing practices.’
Indeed, beyond the brave new opportunities that data sharing brings, myriad practical issues must be tackled first, and efficient practices is only the beginning. According to Digital Science’s Hook, many researchers store what he calls ‘small data’ on, say, Dropbox and pen-drives, never to see the light of day. Some researchers still struggle to know where to share data, and of those that do share, many are unsure about licensing conditions and the extent to which the data can be accessed or re-used. ‘Our survey shows researchers are using Google Drive, Dropbox, Figshare and GitHub,’ says Hook. ‘But we are also seeing issues around, say, understanding licences, industrial contracts and national laws.’ Taroni from Nature Physics agrees, and has seen many of his authors grapple with where to deposit data. ‘There isn’t an agreed-upon standard so that everyone in a community can say, “Oh that’s where I submit my data”,’ he adds.
Initiatives such as the Open Science Framework are driving change, but Taroni also believes a definite role exists for publishers. ‘We need to give researchers a clear framework so they can satisfy our policies without feeling like they’re jumping through more hoops,’ he says.
At the same time, Hook believes many misunderstandings can be cleared up by educating researchers on how to share data within a research group and publicly. ‘Right now, this is really an issue of education, which I believe is very aligned with the skillset of librarians and information professionals.’
But education aside, the sharing of sensitive data is still a massive issue. For example, data may be covered by the Data Protection Act, obtained under a non-disclosure agreement or relate to a patent.
Solutions may include depositing this information in a repository under an embargo or in a secure data repository, but for academics in disciplines such as psychology and clinical research, deeper privacy concerns emerge. As Nosek points out: ‘We have to be very careful when working with human data, and privacy considerations will always prevent data sharing from being entirely open.’
‘Good data sharing standards must incorporate these concerns, rather than being used as a blunt instrument to share data,’ he adds. Still, the future looks bright for data sharing, and Hook, for one, is optimistic. ‘In 2012, people were talking about open data in somewhat concerned terms; it was relatively new and people weren’t engaging with it,’ he says. ‘Four years later we’re in a world where 75 per cent of people on our survey have shared data – now that’s a sea-change.’
Clearly not all research is the same and, as such, academics from different disciplines are reacting to the rise of data sharing differently. According to Iain Hrynaszkiewicz, many life science titles already have some form of data policy, so these Nature Springer journals have been quick to adopt latest policy.
‘That said, journals across physical sciences, engineers, humanities and social sciences have adopted the policies too,’ he says. ‘In economics, for example, there are communities that have strong cultures of data-sharing policies in their journals.’
Andrea Taroni from Nature Physics concurs: ‘We’ve recognised culture differences between disciplines; climate scientists may spend years collecting samples of data and are reluctant to freely give it away without any credit. But in fields, data is cheaper to collect, and these disciplines are already more advanced in this data sharing.’
Taroni also says that researchers using large-scale experimental facilities, such as synchrotrons, have been among the first to volunteer data availability statements. He now hopes more of the same will follow in other fields of physics in the coming year.
‘I’m confident we will see this thrive as a practice in physics.’ he says. ‘And as scientists start to think about how they want to structure and present their data, we as publishers need to provide a clear framework to help them to achieve this and satisfy our policy.’