Discovery: finding the way forward

28 June 2017

For many researchers searching for scholarly content, databases such as Web of Science, PubMed, Scopus and Google Scholar, are the springboard. Each database is practical to use, offers numerous search facilities, and in the case of PubMed and Google Scholar, is free.

Crucially, each houses a staggeringly vast array of different types of content, which Jessica Turner, head of scientific research business at Web of Science provider, Clarivate Analytics, believes will always be key to discovery.

As she points out, a hefty 96 million records over 32,000 journals are contained within the Web of Science, with some dating back to 1900.
‘This includes a core collection of globally significant literature but also regional citation indexes such as the Chinese Science Citation Database and SciELO for Latin America,’ she says. ‘We also have specialist databases in many subject areas as well as books, proceedings and data-sets to cover the full scholarly communication life-cycle.’

Citation indexes are a key feature of Web of Science, with citation analysis – tracking the number of times an article has been cited by other articles – offering invaluable insight to article impact. Indeed, Turner reckons such a citation network lies at the foundations of discovery, allowing researchers to make so much more with their list of search results.

‘This provides a much broader discovery experience, so researchers can look at relevant work across disciplines and regions,’ she says. ‘We like to think this supports serendipitous discovery, enabling researchers to look at less obvious relationships and expand scientific discovery.’

But according to Turner, the quality of the metadata collected by the Web of Science is also instrumental to discovery, with every journal in the database being indexed ‘cover to cover’. ‘We collect information accurately about every institution, which allows our analytics to be as accurate as possible,’ she adds.

Content discovery manager, Mike Roberts, at Emerald Publishing, UK, couldn’t agree more. Having recently taken part in the company’s strategic content discovery program, intended to increase the visibility of research output, he believes the quality and timeliness of metadata records significantly influence discoverability.

‘Our MARC and KBART records were always quite good, but now they’re very good,’ he says. ‘One thing we had fallen down on though, was we weren’t releasing these to librarians quickly enough, so we now have workflows to optimise delivery of all metadata as soon as possible after publication.’

‘It’s also extremely important to ensure that metadata held at Crossref is as complete and up-to-date as possible,’ he adds. ‘There are so many services built on, and using, Crossref data that it’s absolutely fundamental to content discovery.’

As part of the strategy program, Roberts and colleagues have also been working on ways to publicise the company’s discovery activities, using product-specific checklists well as the Open Discovery Initiative from the National Information Standards Organisation, NISO.
‘We’re also establishing close working relationships outside of Emerald, with, for example, Google Scholar, ProQuest, EBSCO and WorldCat, to make sure that our metadata is passing successfully through their systems and they have an accurate reflection of our content,’ he adds.

But metadata aside, Roberts believes that recognising discovery as a problem, in its own right, is the most important step to success. As he points out, so many pathways exist to enhance discoverability, that any organisation has to be prepared to make discovery a recognised role within the company.

‘Find someone to help and take advice on approaches and tools, whether it be for SEO, metadata quality, user interfaces, or even the ever-changing foibles of commercial library discovery services,’ he says. ‘We have appointed someone managing content discovery, as well as a cross-functional taskforce with staff from across the business to ensure discovery efforts are co-ordinated across the business,’ he adds. ‘This means that we get the technical aspects right, but also ensure that we’re getting feedback from our customers and users on their experiences.’

Open for research

Open access journal, eLife, was launched in 2012 by biomedical funding heavyweights, Howard Hughes Medical Institute, the Wellcome Trust and the Max Planck Society, to rival the big three traditional biology publishers, Nature, Science and Cell. Five years on, investment is flowing – the journal received $26 million from its funders just last year – and the number of publications tops 1800.

Clearly optimising discovery is critical to the publisher, and from word go and in line with the likes of Clarivate Analytics and Emerald Publishing, a wide range of content designed to appeal to as broad an audience as possible has been served up. For example, plain-language summaries – ‘eLife digests’ – written in collaboration with authors promote accessibility and help users to ‘join the dots’ between scientific disciplines.

In a similar vein, expert commentaries – Insights – provide context, podcasts target a wider audience while impact statements from the author and reviewing editor enhance comprehensibility for all.

Metadata quality is also considered key. Interestingly, Jennifer McLennan, head of external relations at eLife, reckons being relatively new to the world of scholarly publishing has had key advantages here. ‘We’ve been able to capitalise on decades of experiences in publishing, and deploy the latest standards and best practices in metadata as well as information organisation,’ she asserts.

‘Our head of production operations is chair of [common XML standard group] JATS4R, we publish our XML to a GitHub repository and deliver content to many downstream sites such as PubMed Central,’ she says. ‘We have also aimed to push as much metadata to Crossref as possible and ensure each reference is open via its application programming interface, API.’

Metadata aside, as part of ongoing development, and in a bid to streamline navigation and rapid content discovery for both the desktop and mobile device, the publisher is set to release the latest iteration of the eLife journal website; eLife 2.0. Throughout design, user behaviour has been under scrutiny, via analytics and direct interviews, providing valuable insight.

As eLife’s head of product, Giuliano Maciocci, highlights, the majority of eLife users reach the website from a search engine, hitting the site at the article level. Given this, the publisher has worked hard to improve article navigation in myriad ways.

For example, priority is given to an article’s title and body-text over other metadata, and ‘distracting site furniture’ has been removed from the article page. What’s more, the traditional multiple tabs and table of contents navigation is now collapsed into a single menu while mobile navigation has been optimised to cater for the platform’s reduced screen real estate.

In 2013, the publisher released its open source tool, eLife Lens view, that allowed readers to explore figures, references and more, without losing their place in the article text. In eLife 2.0, this will be fully integrated as a ‘side by side’ view alongside the article, rather then ‘relegated’ to opening a new tab, as Maciocci puts it.

The changes don’t stop here. A further reading section, following the main text, will suggest related content based on the currently displayed article while source data associated with figures will be visible in the context of the figure itself. And a so-called Magazine portal will host non-research content – including podcasts and blogs – to draw in generalists.

However, as Maciocci is keen to emphasise, not all users reach eLife from a search engine. ‘The users that hit our front page directly tend to be specialists, and they will benefit from a Major Subject Areas list, which provides a jumping-off point into their specialty area,’ he says. ‘They won’t be needlessly exposed to the general list of recent content outside their areas, and this was well received in our direct user testing.’

Change has also featured high on the agenda for Silicon Valley-based Atypon. Recently acquired by publishing giant, Wiley, the software developer has made significant enhancements to its online publishing platform, Literatum. And discovery has been at the heart of its changes. As part of this, the company’s universal content type technology – Digital Objects – now assigns a Digital Object Identifier, DOI, to any type of content or media from blogs and news articles to videos and interactive visualisations. The company reckons each Digital Object can be tagged, indexed, packaged, targeted, promoted, bundled and sold as easily traditional content, such a journal article. And crucially, such as move should attract readers to new types of non-peer-reviewed content.

Marty Picco, Atypon’s vice president of product management strongly believes that researchers need to find the information they are looking for as quickly as possible. And beyond promoting ‘universal content’, he asserts several other factors are needed to optimise discovery.

For example, a very good search engine with the latest ranking algorithms is imperative while recommended content is key. According to the vice president, Literatum predicts and recommends content that is likely to be of most interest to each visitor. Recommendations are drawn from the publishers’ entire corpus and the company also analyses user behaviour to identify the most relevant content.

But beyond finding research as quickly possible, Picco also believes researchers must be able to access it with as little fuss as possible. And this, he says, is still a challenge.

‘The worse case scenario in the world is somebody goes to Google Scholar, gets a result, starts clicking through and finds that each result is from a different publisher site and may also need ID authentication,’ he says.

‘You’ve got to jump through hoops to get to the point where you have access and this is tremendously frustrating.’

To this end, Atypon is part of Resource Access for the 21st Century (RA21), a joint STM – NISO initiative aimed at optimising protocols across key stakeholder groups to promote seamless access from site to site. The company has developed a technology stack for the initiative, which is currently under evaluation, and Picco is confident that RA21 will see clear results come next year. ‘It won’t solve our authentication issues but it will certainly streamline the problems and relieve user frustration,’ he says.

Fit for the future

So what does the future hold for the multi-faceted world of discovery? e-Life, for one, is intent on becoming a central resource for discoverability, but with open source content and tools forming the backbone of operations. For example, the organisation is currently collaborating with US-based open annotation platform developer, Hypothes.is.

‘This annotation tool will facilitate discussion amongst readers of research in an online, live environment and will be available to any publisher to plug into their website,’ points out McLennan. ‘We want to be at the centre of using tools to accelerate discovery, and serve as a testing grounds before making [the software] open for others to use and take forward.’

According to Clarivate Analytic’s Turner, more and more analytics, and better visualisations, will be used in the Web of Science.
‘We are also always looking at integration with the wider research community and in making sure that Web of Science data and metrics are accessible in multiple different platforms,’ she says.

Clarivate Analystics also intends to continue working with scientometrics research groups around the world to track trends in science, and wants to better understand the broader social and commercial impact of its science: ‘We already have patent data linked into our research literature, allowing us to perform a lot of those analytics and really understand what’s going on,’ says Turner.

Like Turner, Atypon’s Picco believes smart visualisations, figures and content have great potential in scholarly publishing.

‘We are always working on bringing the data alive and really want to connect the science and the scientists,’ he says. ‘Providing more dynamic content and tools helps us to connect [researchers] more directly to the underlying science… and this where the whole of discovery is going.’