Good interaction between published papers and raw data can speed scientific discovery, write IJsbrand Jan Aalbersberg (left) and Judson Dunham, who are involved in creating such links at Elsevier
The movement to bring more research data out into the open continues to gain momentum. This is being driven by funding bodies and international organisations, and enabled by modern storage capabilities and network technologies.
The availability of raw data can prevent a researcher from having to reinvent the wheel. However, available data is often out of context, in a difficult-to-manage format or is completely divorced from the formal research record. Much of this data is stored in repositories independent of the research and analysis derived from it. Scientists want to view corresponding data from published articles, and to view corresponding published research to raw data. Too often missing “links” between the data and the article waste valuable research time.
The next era in research articles will take content beyond what is provided by the author, linking to relevant data and other information from external sources to provide even greater added value to researchers.
Different disciplines might focus on different content types, such as telescopic data for astronomers or molecular images for biologists, but across the scientific community authors are increasingly adding supporting content that can bring further depth and context to an article. But, for this to happen, the right dots must be connected – giving researchers the content they need and helping them to find the proper context for the content.
Bridging the gap between the formal research publication and other available data cannot be done with technology alone. Partnerships between research organisations, scientific publishers, information solution providers, funding bodies and other groups committed to scientific discovery present the best opportunity to encourage linking between research, data sets and other relevant content. Cooperative partnerships can effectively marry that additional information with the relevant research article. Publishers are working with the scientific community to encourage preservation and uniformity of large research data sets and create links between research articles and relevant related content.
One example is Elsevier’s partnership with PANGAEA (Publishing Network for Geoscientific & Environmental Data). This is paving a new way to open up data from earth system research by reciprocal linking between data sets deposited at PANGAEA and corresponding articles in Elsevier journals on ScienceDirect.
PANGAEA’s data library already links primary data to related articles in earth and environmental science journals from several publishers. However, rather than searching for data in a repository and then looking for the corresponding article, a researcher is much more likely to first find an article in a literature search, and then realise the need to access the corresponding data set. Thanks to the collaboration between Elsevier and PANGAEA, each article on ScienceDirect with a corresponding data set in PANGAEA now displays a prominent link directly to that data set. This puts the data into context with the primary literature, increasing its discoverability by embedding it at the right point in the knowledge discovery process.
Another example is NextBio. Most scientific information on genes, pathways, diseases, tissues and compounds is available online but it currently resides in various disconnected locations. Elsevier and NextBio partnered to integrate NextBio’s ontology-based semantic tools with ScienceDirect content. This enables life sciences, health sciences and chemistry researchers to analyse peer-reviewed literature together with publicly-available research data from PubMed, clinical trials, experimental data, and news articles. NextBio indexes an article on ScienceDirect, surfaces the key concepts relevant to the content of the article, and provides summaries of information related to those concepts, drawing from the article itself as well as NextBio’s correlated content.
Opening up data
Integration of content and tools from sources like PANGAEA and NextBio will be a boon to researchers in the subject areas they serve. However, the universe of available information and the vast range of tools and services that can be put to use to help accelerate science goes well beyond a few specific implementations.
With this in mind, ScienceDirect has built the capability to quickly plug in applications like NextBio as they are developed, so that external data or relevant analysis tools can be integrated directly into the online article.
Relevant articles in ScienceDirect have links to the raw data in PANGAEA
The applications currently under development use a variety of information sources to determine the context of the article and the researcher. These include the text of the published article, the search run by the user to find the article, even additional machine readable data submitted by the article’s author to support a specific application.
With that context, these applications can draw in information from any available source, and provide functionalities to the researcher directly within the article – without the need to download additional software, view the content on a different platform, or search in multiple databases.
Widespread linking of data and content has the potential to streamline search and discovery within the research process. As more partnerships are created to link relevant scientific information, researchers will be able to get more relevant data and content with fewer clicks and searches. And, as links between knowledge sources enable researchers to extract new insights from existing content, researchers will be able to build on each other’s work to achieve new scientific discoveries more quickly and efficiently.
IJsbrand Jan Aalbersberg is vice president of content innovation at Elsevier Science & Technology; Judson Dunham is product manager at Elsevier