The changing face of databases
Lein de Leon/Shutterstock
As researchers are presented with many more ways to find scholarly papers online, Siân Harris takes a look at the roles of the traditional A&I database and potential disruptors
Abstracting and indexing (A&I) databases have played an important role in finding scholarly information for many years. Originally these indexes were in bound, print forms but their information leant itself particularly well to an early move online.
As the body of online content grew, and especially as DOIs became more prevalent, such databases also became jumping-off points to reach the articles directly.
Some industry heavyweights have strong roles in providing trusted, subscription-based sources of bibliographic information, links and, more recently, citation information.
‘The role of scholarly databases is essentially as a catalogue of the world’s important research,’ said Chris Burghardt, VP for product and market strategy for Web of Knowledge at Thomson Reuters.
‘It starts with the researcher; the catalogue helps them understand the research landscape so that they are able to advance more quickly with their research,’ he said.
‘Web of Science is a very unique database in the market. It does not cover every journal in the world – only around 12 per cent of journals are accepted,’ he continued, adding that even in the 12,000 journals in the database the top 300 cover 50 per cent of the top research. Within these journals, he said, Web of Science provides depth of coverage, going back to the 1900s. ‘Where we see our benefit is helping researchers find novel research and ensuring that we provide really trusted results.’
However, databases like Web of Science are not the only tools available, as Burghardt observed. Indeed Web of Knowledge includes 14 other databases to form what Burghardt described as a ‘citation universe’, giving more in depth coverage of specific regions or areas. Web of Knowledge offers a regional database, the Chinese Science Citation Database, and recently signed a deal in Brazil to add the Scientific Electronic Library Online (SciELO) to Web of Knowledge.
Thomson Reuters also licenses other specialist databases as part of Web of Knowledge. One of these is Inspec from the Institute of Engineering and Technology (IET).
‘The role of Inspec is as a comprehensive subject and taxonomy database across engineering,’ explained Daniel Smith, head of academic publishing at the IET. ‘We are just about to hit our 13 millionth record and we see a 5-10 per cent increase per year. We are a specialist service and are therefore not trying to compete with general databases.’
Such databases have become widely-used and respected over many years. However, the development of more and more online scholarly services, as well as developments in the wider internet, have brought some potential disruptors to the traditional A&I business approach.
An obvious potential disruptor to the A&I database’s traditional role of helping users to search and find links to scholarly content is the internet search engine. Although the likes of Google do not restrict themselves to scholarly content, the search engine has become sophisticated at delivering many relevant resources to researchers. Google also has its own scholarly search engine Google Scholar, although this appears to have fallen from favour somewhat within the company, with Google Scholar no longer linking directly from Google’s home page and some researchers admitting to simply using Google.
‘Nobody is going to try to set themselves up to go head to head with Google,’ said the IET’s Smith.
‘It is a very useful tool to just find out what stuff is out there and we are not trying to compete. We have what engineers select for engineers to use.’
He noted the draw-back of Google is that it is not focused on any particular activity. In addition, it only covers things online. This runs the risk of missing what Smith described as ‘sleeping beauties’ – ideas around from pre-online days. For example, he said, the ideas of concentrating solar power, early lasers and designs for steel ships were all published pre-internet.
With validation of quality and relevance, long-established, subscription databases potentially have the upper hand. ‘Nobody really knows what relevance ranks Google uses,’ observed Smith. ‘And Google has no idea what you want to do with the information it suggests.’
Crowd sourcing and networking
Some other potential disruption comes from the combination of two hot topics in the industry – crowd sourcing and social media, two elements that are key to the approach of the free reference management and social media service Mendeley.
When Mendeley announced in the summer that API calls to its database had passed 10 million per month and that its shared database contained 65 million unique documents, there was speculation on blogs that Mendeley had become the largest A&I database.
This is something of a side story for Victor Henning, co-founder and CEO of Mendeley: ‘When we started out we didn’t necessarily intend to be a replacement to existing databases although we were aware that the information would be very valuable,’ he said.
Mendeley’s initial aims were to enable researchers to organise their documents and share them with colleagues using social networking tools. However, several studies have now reached similar conclusions about the impressive size of Mendeley’s database, concluding that the database has around 97 per cent coverage of academic literature.
‘People can discover research and find related research. It is certainly useful for people who don’t have access to Scopus and Web of Science,’ Henning conceded.
So, what do traditional database companies think about such potential disruptors? ‘Clearly we monitor all developments and Google Scholar and Mendeley are very important developments,’ said Burghardt of Thomson Reuters. ‘Google is an impressive company and is often a good place to start when you don’t know where to look – or at the other extreme if you know exactly what you want because you have, for example, the DOI of an article.’
On the topic of Mendeley, Burghardt noted that it has a different approach from traditional databases. ‘The Mendeley product is embedded more into the research workflow, especially with young researchers, and is very reliant on the crowd for sources. The question really comes down to what’s the longer-term sustainable plan,’ he observed.
There is another issue too: that of print. Mendeley’s content is ingested as PDFs but many journals are still only available in print. Indeed the IET’s Smith noted: ‘We are still indexing print content – and print is around 50 per cent of the material that we currently receive and review.’ The reasons, he said, are partly geographical. The Russian Federation, for example, still has an emphasis on print.
There is also the issue of quality of the records. To enrich the information in Mendeley’s database the company is working with a number of publishers to add information to existing articles on the platform. These include Springer, IEEE and de Gruyter, according to Henning. Although a time-consuming process, this is useful because the content on Mendeley is crowd sourced and people tend to only add details that are relevant to them – which often excludes abstracts and DOIs.
Disambiguation is also an important challenge for Mendeley and for all platforms that take content from multiple sources. Database providers are optimistic that quality and relevance in this area will increase as ORCID, an industry project working on author name disambiguation, makes recommendations later this year.
‘A main challenge of globalisation of research is that people often have very similar first names or last names,’ said Burghardt of Thomson Reuters, which is heavily involved in and licensed technology to the project. ‘There’s a limit to how much can happen with automation and I applaud the industry for coming together with ORCID.’
There are a range of different approaches to finding bibliographic information online and the tools to do this are changing all the time. For example, the IET is working on a deal to combine the information in Inspec with patent search tools to help in identifying prior art, while Thomson Reuters is planning to launch a Data Citation Index later this year.
For the moment at least, the different tools appear complementary and have their own roles in the research process. What should guide them all, however, are some words from the IET’s Smith: ‘People are not using search tools because they want to search but because they want to find something.’