Data is at the heart of new science ecosystem

15 February 2010

Share this on social media:

Issue:

February/March 2010

Topic tags:

Open data and open APIs offer huge opportunities for research and innovation, writes Elsevier's Rafael Sidi

More than a century ago, the English philosopher Herbert Spencer said that ‘science is organised knowledge.’ The concept resonated at the time, but little did Spencer know that what lay ahead for researchers was an ongoing struggle against information overload. With added layers of scientific insight building an increasingly complex research environment over the years, the modern equivalent of “organised knowledge” requires a whole new set of tools, practices and policies that are as much about data integration and interoperability as journals and archives.

Interestingly, Spencer is also credited with coining the phrase “survival of the fittest”. As we enter a new decade, making data openly accessible, pliable and interconnected will be both one of the greatest obstacles and one of the greatest opportunities for the scientific community. The ability to adapt and thrive will require researchers, librarians, publishers and platform providers to play an integral role in creating a new “scientific knowledge ecosystem” focused on delivering enriched, “intelligent” content that accelerates the search and discovery process.

Fuelling the need for this change is a shift in the very nature of research. This has become an industrial-scale operation where specialised teams distributed globally work together toward a common goal. Collaboration across fields and borders is now the norm. Once insurmountable geographic boundaries have been erased and the rigid lines between disciplines are blurring. This is multiplying the quantity of information that researchers must regularly digest. At the same time, new technologies have resulted in a tremendous expansion of datasets and information resources. Unfortunately, these assets today are largely disconnected, making search and discovery evermore time consuming and inefficient. This siphons valuable research hours and forces scientists to “reinvent the wheel” instead of building on existing knowledge – ultimately slowing the pace of important discoveries.

Available but not fully realised

In a recently published book, ‘The Fourth Paradigm: Data-Intensive Scientific Discovery’, the author Tony Hey addresses the current and potential opportunities for advanced computing to help researchers ‘manipulate and explore massive datasets’. Published by Microsoft Research, the book is a collection of essays and asserts that ‘The speed at which any given scientific discipline advances will depend on how well its researchers collaborate with one another, and with technologists, in areas of eScience such as databases, workflow management, visualisation, and cloud-computing technologies.’

Scientific search and discovery can be greatly accelerated when we exploit the best computing practices of the commercial industry. A key hurdle will be achieving interoperability between systems and getting intelligent information from the data as it becomes more interconnected.

Even as technological advances are beginning to deliver content agility and connectivity, the true power and potential of content will remain limited as long as raw research data is not linked and shared openly. Once this can be achieved, it will lead users to re-use, remix, annotate and enrich the content semantically.

Open data drives innovation

As research becomes more multidisciplinary and collaborative, access to the raw data and the relationship between the data will emerge as critical components for fuelling scientific discovery. Easy access to linked data will allow researchers to build upon the work of their peers around the globe, enabling them to reuse and remix content to generate further breakthroughs.

To date, the open data movement is being led by government institutions in the UK, USA and Australia. Each country has set up an initiative designed to make non-sensitive government information available to the public by offering access to useful data sets with the potential to benefit society.

The open data concept is gaining momentum in the scientific community at large. Jean Claude Bradley at Drexel University, for example, started the Open NoteBook Science project which encourages researchers to make the primary record of their projects publicly available as they are recorded. Galaxy Zoo, an online astronomy project inviting members of the public to assist in classifying over a million galaxies, is an example of community-driven scientific knowledge creation made possible by shared data.

The Human Genome Project opened its databases to the public in 1990. By 2003, it had succeeded in sequencing all the base pairs in the human genome and it did this under budget and more than two years ahead of schedule. But, while the benefits of open data are clear and easily illustrated, issues still remain about how best to create meaningful links between the data and how to tackle hesitancy among researchers to share their hard-won findings.

Collaborating in the science ecosystem

With acceptance in the government and business worlds, it is just a matter of time before the “open data” trend fully crosses over to the scientific community. This will create a significant opportunity to enrich content and speed innovation.

This opportunity can only be fully realised if researchers, universities and content platform providers, including scientific publishers, are willing to offer access to their raw data. The likelihood of this increases as each party begins to recognise the potential benefits that can be gained in return.

Once raw data is made available, using it through application programming interfaces (APIs) will be crucial. Publishing APIs creates an “openness” for making content and data available across the web and between applications. Consumers already benefit from the release of APIs in a small way almost every day – whether developing mashups or using one of the tens of thousands of iPhone applications available. Imagine the power and benefit of generating tailored applications for scientific researchers focused on improving the search and discovery process.

Open APIs will allow the scientific community to experiment and build innovative applications for solving the specific pain points of researchers. APIs will eventually turn into powerful platforms where researchers will develop applications that can be used to build more tailored applications. The creativity of the scientific community will result in applications that could not have been thought of by the content owners.

As applications proliferate and deliver “intelligent information”, content consumption will be fundamentally changed. Content will be filtered and enriched based on the interest and background of the searcher. Researchers will be able to weave together the data, essentially developing their own “personalised views” of the information that is most meaningful to them.

We can also expect to see micro-communities designed around information and applications in which users help each other curate and connect. As they evolve, these communities will transform into trusted networks which researchers will use as a reliable source for filtering and viewing information.

With the enormous potential to enhance the search and discovery experience, universities as well as commercial and government institutions must encourage their researchers to develop new applications using open APIs. This will require them to acknowledge and reward those who invest their time and energy in building applications as well as those who test and validate them.

Within these institutions, it is the librarians who perhaps have the greatest opportunity to champion this new cause. Not only do they serve as the knowledge managers and information experts for their organisation, but they best understand the needs of researchers across disciplines and career stages with respect to search and discovery. They are also the ideal source for reaching out to different departments to develop applications that will solve the specific needs of their customers.

A small but growing number of scientific search and discovery applications already exist but they have been developed on a limited pool of content. They are also primarily used within individual institutions, leaving them insufficiently exposed to the global scientific community.

Opportunities for publishers

Offering their content through open APIs, publishers and platform providers can present researchers with application building tools based on more comprehensive content. In fact, publishers and platform providers have an opportunity to serve as the host of the new scientific knowledge ecosystem that is evolving. This can create a channel where researchers can buy, sell and collaborate in developing new applications.

By opening their APIs, publishers will have an opportunity to co-create with their customers and innovate faster. They will also need to encourage application developers to join this new ecosystem. Developers will be important players in building applications that increase the productivity of researchers, helping them speed scientific discovery.

Similar to application platforms like iPhone and Salesforce.com, a revenue formula will be necessary to encourage all players to participate in the ecosystem including universities, government organisations and corporations. In fact, challenging economic conditions surrounding research funding across the globe, application development and licensing may create the potential for a novel revenue stream. A new recognition metric will also be needed to acknowledge and encourage the involvement of the researchers themselves.

Building a strong foundation

As the new scientific knowledge ecosystem flourishes, repetition will be removed from the research equation. Eventually building blocks will be created that capture existing knowledge on any given subject. These structures will then be used as the foundation for new discoveries. One scientist may generate a way to purify water, another researcher may use the same data to develop a new power source.

During the last decade of research discovery, computing developments have begun to have a significant impact on the breakthroughs that enhance our society as we know it. The faster this new ecosystem takes hold and scientific application channels begin to flourish, the greater the potential to accelerate science in the next decade.

Rafael Sidi is vice president of product development for Elsevier’s ScienceDirect

Further information

The Fourth Paradigm: Data-Intensive Scientific Discovery: research.microsoft.com/en-us/collaboration/fourthparadigm
Linked data: linkeddata.org
Open NoteBook Science: usefulchem.wikispaces.com/All+Reactions
Galaxy Zoo: www.galaxyzoo.org

Popular

Latest issue