Solving for scale
Tracy Capaldi-Drewett looks at the latest technology in research communications
When was the last time the word ‘open’ came up in your conversations about research communication?
In scholarly publishing, it’s hard to go a day without saying ‘open’. Some may refer to the research itself or data as open, while others refer to content business models or access to content.
Easier dissemination of, and access to, research publications is a long-standing want. A coordinated, community effort to solve the challenges of disseminating interdisciplinary research began with ERIC in 1959. Microfiche was state-of the-art technology when work began on feasibility studies. By 1966, 50 years ago, the first 12 clearing houses for document reproduction and dissemination were built and in operation.
Today, thankfully, technology has transformed research and research communications. Yet, with technology advances, there is a new scale of complexity. Demands for speed to publication and support for global collaboration create new challenges at scale. Between seven and nine million researchers are contributing to STM research today.
According to the recent UNESCO Science Report: Towards 2030, Global Expenditure on Research and Development (GERD) was PPP $1.48 trillion in 2013 and as of 2014, world users of the internet are up 64 per cent since 2008. The report highlights the need to ‘avoid the uncontrolled explosion of big data.’ The advancement of science depends upon reproducibility and transparency of research findings. Imagine the complexities of adding data collected from global citizen scientists. Without protocols for data sharing and data governance, an overwhelming amount of information may be unusable.
The computing power required to use and store the data is just one aspect of scale. Linking data sets to research outputs, including multiple images and journal articles, also requires innovation. Fortunately, Google has solved the problem of search at scale – serving 3.5 billion daily! User behaviour data shows that Google and Google Scholar are the starting points for most searches for and access to research content.
In the technology sector, ‘open’ was popularised in the 1980s, mainly in the context of interoperability. Open platforms and open source solutions have supported research communications for many years, and are essential to future innovation to solve new challenges of scale. At scholarly communications conferences, ‘interoperability’ and ‘research communications infrastructure’ are becoming increasingly frequent conversation starters.
What is an open platform?
An open platform is a system that makes data available both to users and external systems, developed with open standards.
From telecommunications to research communications, platform sponsors choose the degree of openness they want to support. A goal of openness is to enable integration and interoperability. This interoperability is enabled by application programming interfaces (APIs) that build upon the system without changing it. The degree of openness signals the sponsor’s invitation to collaborate across its ecosystem of users, developers, complementary, and even competitive organisations.
As an example, the HighWire Open Platform supports online content management, publication, and access. HighWire Press also offers a journal manuscript submission software solution, BenchPress. Publishers disseminating content on the HighWire Open Platform may use any manuscript submission system. Because the platform is built for interoperability, ‘content ingestion’ from any manuscript system is immediate and reliable, regardless of originating platform. Article metadata, essential to the future discoverability of the published work, also flows with the manuscript.
Amazon Web Services (AWS), another open platform, recently announced it has reduced costs to academic researchers. The powerful platform coupled with transparent, predictable economics will speed data analysis and data sharing to accelerate research timeframes. HighWire uses AWS for products such as Impact Vizor to benefit from the ability to scale smoothly in terms of speed and cost.
What is open source?
Open source is software created with source code freely available for modification or enhancement by developers. These changes are also made freely available, under license, so the benefits are available to the user community.
The results are standards-driven, community efforts based on transparency that provide stable and reliable solutions. People with motivation and skill who need to solve the same problem can all contribute to advance a better solution, no matter their professional affiliation or location. Platform sponsors must constantly consider where to add resources and how to prioritise the backlog of development ideas. A company working with an open platform can prioritise its own resources and also engage a wider community for support.
Drupal is open source software. With over one million participants, the Drupal community is among the largest developer communities in the world. Drupal offers a content management solution used by world-recognised publishers, universities, large news media outlets, including The Economist and BBC Worldwide, and government agencies such as data.gov.uk and whitehouse.gov. The Drupal community is built on principles of collaboration, accountability, globalism, and innovation. HighWire selected Drupal as the technology for the presentation layer, or web display, of its platform in 2009.
The scale of the Drupal developer community means vast libraries of ‘modules’ are proven and tested. This enables developers to respond faster to market demands. For example, in research communications, publishers want to optimise the speed of indexing new content by search engines to improve discoverability. XML Sitemaps help Google to do this faster. Instead of having to develop new code to pass XML Sitemap data to Google, developers at HighWire configured an existing Drupal module for ‘robots.txt’. This quick solution can then be implemented repeatedly. It is used across HighWire’s platform and other publishers using the Drupal module benefit.
DOIs and APIs
The Digital Object Identifier (DOI), a standard persistent identifier for journal articles, has revolutionised the ability for computer systems to have conversations about a journal article. From pre-publication to post-publication, a journal article’s DOI is a “key” for an API. Many systems use APIs referencing the DOI to enhance the information associated with that journal article. Innovators can solve problems and take advantage of new opportunities using open systems, APIs and persistent identifiers.
For example, DataCite provides DOIs develops and supports standards for persistent identifiers for data, so that systems can recognize the DOI as a data set and include appropriate metadata. This enables researchers to reference data sets and create citations to data. Other systems can link DOIs for data to DOIs of associated publications.
Innovation and interoperability
Today, innovation is accelerating to solve problems across the entire research communication workflow. Bianca Kramer and Jeroen Bosman, librarians at the Utrecht University, began a project to chart innovation in scholarly information and communication flows from evolutionary and network perspectives. Their work on ‘101 Innovations in Scholarly Communication’ has uncovered 600 software solutions of varying scale. In the past three years, they have identified 250 technology solutions.
At that pace of innovation, technology interoperability becomes more beneficial to the larger ecosystem. Platform providers can integrate innovations that the community adopts to help these new approaches scale more quickly. Several publishers have recently mandated the use of ORCID iDs, persistent identifiers for individual researchers that can connect authors with their published research, regardless of changes to name or affiliation. The ORCID registry is on an open platform with open source and APIs to its registry of unique identifiers and they invite participation in the ORCID technical community. Today there are more than 2.2 million individuals with an ORCID iD.
Peer-review offers another current example. Open source solutions enable new approaches for discussion of research. Some publishers want to increase transparency of peer review. Others want to enable ongoing discussion and commentary on research. Open source solutions make it easier for developers to support these goals. Disqus, a discussion platform, can support ongoing conversations centred around newly published articles. Simply by configuring the platforms to work together, developers can support innovative approaches to post-publication peer review.
The Annotating All Knowledge coalition of scholarly publishers, libraries, and technology organizations, including HighWire, is advancing standards to integrate annotation and conversation on the web. Using open tools such as hypothes.is can support integration of commentary into peer review workflows. When scholars begin to use this innovative technology, new needs and applications will arise. An open approach will enable more rapid evolution across systems.
Attracting talented developers
Talented software developers are in demand across all industry sectors. To scale for growth efficiently in the scholarly communications sector, finding a large talent pool of developers is essential.
HighWire recognized the need to plan for growth at a rapid scale when developing new web site solutions in 2008. When investing in any platform, tapping into a skilled talent pool is more efficient than training new hires. Open solutions and focus on community were compelling reasons to choose Drupal for web site development. More than that, the number of talented developers who were actively contributing and maintaining code to Drupal modules was an important consideration.
To increase speed to innovative solutions, HighWire is expanding its global development team, adding dozens of technology positions in Belfast, UK. There is a passionate community of Drupal engineers in Belfast and more who would like to build their career as developers in the community. There is strong competition for talented developers across all industry sectors. Solving for scale and attracting the best developers over the long term requires creativity. HighWire has made a multi-year commitment to run a training program, the HighWire Academy, with local partners SERC and the Northern Ireland Department of Employment and Learning.
This challenging training course focuses on Drupal, PHP, open source content management systems and Linux. The first five-week session was fully subscribed with 18 high caliber developers selected from over eighty applications. This investment in growing the talent pool is one way to contribute to a dynamic developer community, secure a highly trained talent pool ready to contribute new solutions and establish a center of excellence.
The need for easier dissemination of and access to research publications will continue to drive workflow and technology changes. New needs will arise to support global collaboration with the next generation of researchers. These “digital natives,” who can’t imagine working with microfiche, are used to sourcing solutions to their problems online and even coding their own solutions. As they begin their research programs, their approaches to workflows and use of technology will identify new ways of working, expectations, and needs. Open, interoperable technology solutions will help build a more responsive ecosystem that supports innovation in scholarly communication more quickly to bring it to a new scale.
Tracy Capaldi-Drewett is vice president for EMEA sales and global marketing at HighWire Press