Scholarly searching

1 June 2015

Neil Block, vice president of discovery innovation, academic libraries for EBSCO Information Services

Discovery refers to the unearthing of scholarly information in support of research outcomes. In other words, a discovery service aims to make published scholarly work readily discoverable and accessible to anyone conducting research in any given field or discipline.

Discoverability of content is fundamental to successful research outcomes. Moreover, discoverability is critically important to the success of an academic and research library. Libraries, after all, serve as gateways to scholarly content and, as such, require a discovery service that provides the best possible access to content and the best possible experience for researchers.

We are currently facing several challenges. First, consider the sheer scope of content that is available for discoverability. A discovery service must surface relevant research across billions of records. Moreover, researchers comprise of different groups (undergraduate, graduate, etc.), across a myriad of disciplines. As such, the discovery service must be able to cater to a multitude of needs and support a wide array of institutional and user requirements.

A discovery goes beyond a focus on metadata alone when considering how to make the service more effective and efficient. There is much to consider: the inclusion of all content types, the technological approach to search and relevance ranking, the interoperability of the solution with third-party applications (including the ILS/LSP) and – importantly – its ability to streamline workflows from content selection to discoverability and fulfilment.

A clear opportunity exists in ensuring an intuitive search experience. For example, the discovery service may further evolve to help users find answers by mapping subject headings from available indexes (allowing for 'use for' and 'see also' references).

John Sack, founding director, HighWire Press:

The term ‘discovery’ in a reader’s online world, generally means finding desired materials via the search engines they use. And for most researchers, ‘discovery’ works well since their searches provide hundreds of results using Google or Google Scholar. But content providers in the scholarly ecosystem, such as librarians and publishers, want to be confident that their users have found the most appropriate and best materials, not just ‘lots’ of results.

Discovery has expanded to include finding information through social media, such as Facebook and Twitter feeds, as well as in email TOCs and article alerts that notify readers when new articles are published. These alerts can also mean that information providers can provide recommended content that is appropriate at that particular point in time. Search results, news feeds, alerts and email are the way most readers see new content now. In fact, many researchers rarely ‘read’ entire journals anymore; even the concept of ‘read a journal’ tends to mean ‘read the email TOC’!

While search engines return hundreds of results, optimised by publishers and librarians, the world of ‘discoverability’ is challenged with the need to retain serendipity. The robust power of discovery has impoverished our ‘lean back’ reading. One researcher said to us: ‘Because of keyword search, I only find what I’m looking for.’

Most researchers come to journal sites with a ‘grab and go’ mentality (grab a pdf and get out of the site). I refer to this as MSP: male shopping pattern. The focus is on buying one item: men go into a store, select what they are looking for, buy it and get out. If the store doesn’t have exactly what we expect, we generally don’t waste time and start browsing another aisle. So, as an information provider, how do you break that pattern of behaviour as readers are seeking research articles? How do you get readers to turn the page of the encyclopaedia, so to speak?

The challenge for content providers is to drive awareness of relevant and interesting content, and doing it at a time when users are open to finding new things. In an era when the reader’s/researcher’s entire workflow is online – not just the ‘read the literature’ part – we have to find new opportunities to provide discovery throughout the process, particularly by providing the serendipity of discovery through social media, email and other such serendipitous opportunities.

Charlie Rapple, co-founder of Kudos:

‘Discovery’ means people finding and applying research. Discoverability is critical to ensuring that we make the most of investment in research. It is becoming more so thanks to budget cuts, and the transition of publishing from print to digital, which have respectively driven and enabled an increased focus on article-level impact; this in turn drives increased pressure to be discoverable.

There are related drivers such as the open access movement, which relates to discovery in multiple ways – for example, the expectations of a paying author in terms of her research being discovered by a wide audience. Just as we are being encouraged to think more broadly about what ‘impact’ means (beyond academic, to economic or societal impacts), we must also think more broadly about where ‘discovery’ begins and what it entails. It is no longer enough to make sure that content is discoverable through academic channels (such as abstracting and indexing services or library platforms); we also need to consider the implications of discovery through traditional and social media platforms.

Much effort has been expended on making our content more technically discoverable, from widespread dissemination of metadata and search engine optimisation, to initiatives such as the STM principles that support and encourage sharing by authors. Attention is now turning towards making our content more conceptually discoverable – ensuring that it can be understood by people outside the field, in support of diverse goals ranging from driving interdisciplinary collaboration to giving taxpayers a means of understanding the research they pay for.

There have been several calls – most recently, at the Royal Society’s recent meeting on scholarly communication – for ‘lay summaries’ to be created for every publication. These would help not only to increase “conceptual discoverability”, but also to optimise discovery by those within the field, by making it quicker and easier for them to digest more of the literature as it grows around them. Our challenge is to motivate researchers to provide these plain language descriptions of their work by connecting the dots and showing them what a dramatic effect they can have on discoverability, and thereby on usage and impact.

There is much opportunity to make discovery more effective and efficient by deduplicating effort and helping researchers manage their discoverability in one place. This ties in to another recent STM initiative, its tech trends for 2015, one of which is the ‘article as a hub’ for discovery of related research assets (methods, code, data, etc). This is part of the value that we hope Kudos adds: we aim to increase the technical and conceptual discoverability of research and related assets, and enable researchers to do this more efficiently by providing them with one platform across all their publishers, all their communication channels, and all the metrics that matter to them.

Christine Stohn, senior product manager for discovery and delivery, Ex Libris:

Library users have a variety of needs and goals when they search for information. They may be looking for a specific item, such as a book, an article, an e-book, or a multimedia recording. Also, the information they seek may well depend on their level of expertise; for example, undergraduates may need materials for course assignments, whereas professors are more likely to be searching for the latest publications in their field or seeking data for a research project.

Libraries should cater to these diverse needs, providing users with an intuitive way to find the materials they desire out of the sea of physical, electronic, and digital resources and enabling users to access those materials in the easiest way possible. The challenging goal for discovery systems, then, is to enable libraries to bring together users, their intent and needs, and the wealth of available information.

Search log analysis conducted by Ex Libris on a regular basis indicates that searches for specific items, or ‘known-item’ searches, remain a major discovery method for users, who expect to rapidly obtain the most relevant results.

However, this method, often called search and find, is far from being the only discovery practice that is important to the users of today.

A recent user study conducted by Ex Libris clearly shows that library users, particularly students, have broader needs and greater expectations from the discovery process. To understand these needs, we can look at several building blocks of discovery:

Exploration: Users benefit from serendipitous discovery by browsing and by following recommendations and navigation trails. A typical exploration begins with a user’s selection of an item – for example, a subject, an author, or a title. The discovery system presents another group of items, from which the user makes another selection. Then the user continues to navigate onward to additional sets of related items, narrowing or expanding his scope.

Learning: The discovery process provides opportunities for learning. When working on papers and projects, undergraduate and postgraduate students often need to acquire new vocabulary, obtain new knowledge, and understand a general research area before narrowing it down.

Personalisation: Users expect discovery systems to provide the results most relevant to their needs and their search context. For example, a scholar who submits the query game theory might be seeking material in economics, whereas another scholar might submit that query to find material in mathematics. The same list of results does not work for everyone.

The Ex Libris user study included analyses of search logs, workshops with librarians at multiple institutions, qualitative surveys, and user interviews. A report on the research is available on the Ex Libris website.

Laird Barrett, senior digital product manager for Taylor & Francis:

The discoverability of scholarly journal articles online has been and continues to be paramount for scholarly journal publishers and for the authors whose work we publish. Discoverability can help to drive usage, and usage can help to drive citations. Citations are confirmation for authors that their work is being engaged with and built upon.

Taylor & Francis does many of the right things to make the work we publish on behalf of authors discoverable online:

We provide title list to link resolvers and support OpenURLs to facilitate library users discovering journal articles in library discovery systems and linking through to those articles on our journals platform, Taylor & Francis Online.

We feed article metadata out to abstracting and indexing databases around the world, including Thompson Reuters, Scopus, Primo Central, PubMed Central, CNKI Scholar, etc., from where those articles can be discovered.

We allow search engines to crawl and index Taylor & Francis Online and provide Dublin Core metadata on our webpages to help ensure the accurate and clear indexing of articles. As Google Search and Google Scholar have grown as a means for discovering journal articles, we’ve formed a close relationship with the Google Scholar team and now send them information so that they surface direct links to article PDFs on our platform to subscribers, based on their access, and for all open access articles.

l _Our marketing team works hard to feed out articles that we publish online via a variety of social media channels and we make social sharing buttons prominently available from journal and article pages on Taylor & Francis Online to encourage readers to share articles across those channels.

What the above illustrates is the size and range of the job to ensure the discoverability of articles online. It’s a job that only grows and evolves as new channels for potential online discovery emerge and the requirements for maintaining the existing channels change. And that’s the principal challenge for publishers when it comes to online discovery: keeping up with and ahead of it all.

The Taylor & Francis Online roadmap this year includes a development to bring our title lists into line with the new KBART II standard, a partnership with – and article metadata feed out to – Baidu Scholar in China, a refinement of page titles and the use of the ‘no index’ attribute across Taylor & Francis Online, and the display of altmetrics on the platform, which will inform authors and readers about the online discussion around articles and, we hope, encourage further online sharing and discussion. These developments will help to keep us up to date as online discovery evolves, but there is, of course, always more that we can and will do.

Online discovery is multi-faceted and ever-changing. Because of its importance to usage and citations, publishers simply must manage it well.

James Phimister, vice president, strategy, at ProQuest:

Discovery is the cognitive leap that an individual takes as they probe and step outside their sphere of knowledge. It is also a process as individuals encounter new information, make sense of it, internalise it, and store it to support future action. The discovery process is innate, yet it can be supported and fostered. Librarians and information professionals devote careers to aid discovery, thinking through all its aspects and seeking new ways to improve the discovery in support of others.

Discovery is core to how we support our customers and end users. When we digitise content for inclusion in collections, we give thought to how it will be discovered, so we replicate and render with accuracy, and tag with rich metadata to ensure discoverability. When we host content, we work across our organisation and with partners to ensure valuable and essential content that libraries subscribe to can be found and utilised to its fullest potential. And, when we develop discovery tools, where users repeatedly return to seek information, we design for the end user, so they are served relevant, comprehensive and unbiased information from which to navigate and hone their search.

For online discovery to occur, digitisation is the first step of representing historic content. At ProQuest’s facilities in Ypsilanti, Michigan, and with digitisation partners, we digitise, and tag literally millions of pages each year for inclusion in our products and digital repositories. For example, for the past several years, we have partnered with Europe’s eminent libraries to create six digital archive collections of early European books. These collections, containing the digital files of carefully preserved documents, enable researchers to discover new aspects of history that may otherwise not be recognised.

Once content is created, it must be discoverable, and maximising content discoverability is essential to delivering ROI to library customers and publishing partners. The challenge is not trivial. For example, ProQuest hosts an unprecedented corpus of knowledge of four billion documents from vast collections of journals, books, dissertations, newspapers and other sources. We host content from news published today, to historical works from 200AD. To ensure library patrons can discover this valuable content, ProQuest works with multiple discovery services including ProQuest’s service Summon, OCLC’s WorldCat and ExLibris’ Primo, so ProQuest-hosted content is easy to find no matter the discovery service a library uses.

In fostering discovery it is essential that tools are built with the user in mind. ProQuest’s web-scale discovery service, Summon, serves this need. Summon pioneered single search box access to the breadth of library collections, to meet user expectations to search library content simply. At the same time, it is designed to power research. A user’s search is conducted against document records whose metadata is normalised in a unified index. The search results, based on a continuously honed relevancy algorithm that factors a host of content metadata, delivers neutral, relevance ranked results that appropriately showcase popular and obscure content sources together. In doing so it ensures relevancy, without biasing discovery to content that just has general appeal.

Fostering discovery requires considerable investment. Digitising content, ensuring content is discoverable, and developing search environments that illuminate the unknown, requires time, energy and devotion. This said, the investment is worth it, since at the moment of discovery, our collective knowledge of the world advances.

Alexander van Boetzelaer, managing director, Elsevier R&D Solutions:

Discoverability is increasingly important in today’s digital age – influenced by contemporary search tools and smartphone applications – as researchers must quickly locate the salient data they need and act on it. Scientific output continues to increase exponentially, with more than a million scientific articles published every year and companies generating masses of proprietary data, so discoverability is a crucial issue to address. There is a digital transformation underway; the challenge is how to find the most relevant data and draw insights.

Technology integration is a core factor in discovery; solutions must offer a high level of search sophistication and relevancy, while keeping the user experience and display intuitive. With Google defining search behavior, scientific discovery tools must reflect the ease-of-use that is familiar to many. At Elsevier R&D Solutions, through a combination of technology solutions, APIs that link products, extensive and refined taxonomies and expert text-mining capabilities, we help our customers improve R&D outcomes – to yield sharper insights and drive innovation. Our solutions can normalise both in-house and third-party data, integrating it into a single, secure database, making it accessible for retrieval, analysis and other uses.

We work with a diverse range of organisations across a variety of industries including pharma, oil & gas, and chemicals. For example, Roche uses Elsevier technology to improve the discoverability of its proprietary chemistry data, and MD Anderson Cancer Centre uses our products for research of new drugs and diagnostic tests. Researchers often ask: ‘Am I confident that I have identified the most current and relevant (rather than popular) research?’ It is only with robust discovery solutions that researchers can analyse their experimental data in the context of published results from the literature.

The opportunity to make discovery more effective lies in taking the authoritative information people desire and seamlessly integrating it into the systems they use on a regular basis, in a format that is immediately actionable. After all, there is a significant difference between ‘searching’ and ‘finding’. Information discovery is not only about finding information, it is about finding the right piece of information.

Elsevier’s R&D Solutions act like data magnets, bringing the data needed to the forefront. We aim to provide solutions that arm researchers with the access, capabilities and tools to efficiently aggregate, mine and extract information that is most relevant and useful to their workflows, at the time that they need it.