The power of semantics

Share this on social media:

Topic tags: 

In recent years scholarly publishers have increasingly looked to add value to the content they put online. Sian Harris asks providers of semantic enrichment tools and services how publishers can enhance their content

For the past couple of decades, scholarly publishers have been populating their websites and content stores with a wealth of valuable research material. Such efforts have resulted in vast amounts of information online but the quantity of scholarly information is far too great for researchers to simply browse, find, digest and use everything. For this reason there has been an increasing interest amongst publishers in the topic of semantic enrichment, enhancing information with intelligent structure, tagging and vocabularies to improve discoverability of related and relevant resources.

As Phil Hastings, SVP sales & marketing for Linguamatics, explained: ‘Semantic enrichment makes the publisher’s information more discoverable, providing more value to the user. Semantic enrichment allows searching of the data using concepts rather than just keywords, which means more comprehensive results. It automatically identifies domain specific concepts inside the data and then exposes them for use by end users searching for concepts or keywords.’

He illustrated this with an example from life sciences: ‘If a user searched for “cancer”, they would get back results not only for cancer but also for synonyms such as “carcinoma”, “tumour” and “malignancy”, as well as types like “leukaemia”, “Peutz-Jeghers syndrome”, and “breast cancer”.’

Enriching content in this way opens up several potential opportunities, both in enhancing the accuracy of existing resources and in building new, derivative resources that are tailored to specific needs or groups of people.

Silverchair’s Jake Zarnegar noted that, ‘Publishers use semantics to enhance the core functions of their existing products (such as search, navigation/collections, and related content suggestions); to assemble programmatically targeted new content products that are relevant to specific audiences; to integrate granularly with third-party information sources; and to gain better business intelligence about the particular information needs of our audiences to guide sales,

Daniel Mayer, VP product and marketing at TEMIS, gave some examples of the ways this approach is being used: ‘Semantic enrichment is helping publishers make their content more compelling, drive audience engagement and content usage by providing metadata-based discoverability features such as search-engine optimisation, improved search, taxonomy/faceted navigation, links to structured information about topics mentioned in content, “related content”, and personalisation.’

Derivative content

Mayer said semantic enrichment can also help publishers ‘build new, highly differentiated products and services that exploit highly granular metadata that automated semantic enrichment delivers.’ Examples of these include topic pages or collections, which are thematic publications on a given topic, and knowledge bases, databases of known facts about particular objects or topics. Other examples of application areas for semantic enrichment are analytics; API-driven content delivery; and content-enabled workflow applications. He said semantic enrichment helps ensure publishers’ products are cost-effective and have a competitive time-to-market. This is because automation as a result of enrichment and its outputs can make publishing workflows more efficient.

Picking a partner

For publishers considering enriching their content, partnering with a specialist technology company could be a good option. So how should publishers go about picking a partner?

‘First of all you need a solution that has the necessary precision and recall when processing the documents,’ suggested Hastings of Linguamatics. ‘The solution should be scalable to large amounts of text, flexible enough to deal with different data sources and formats and able to plug in any domain knowledge. The solution should also have an API that conforms to industry standards so that it can be easily integrated into the publisher’s workflows.’ He went on to say that partners should also have the necessary expertise and experience in processing and analysing text data, such as the use of natural language processing (NLP) and understanding the benefits and challenges of using thesauri and ontologies. ‘Publishers should ask “Does my partner have an established customer base that has already benefitted from their technology, and have they worked with publishers before?”.’

Mayer agreed with the need to look for partners with experience in helping publishers, ideally, he added, publishers in the same field (for example biomedicine or engineering). He also suggested publishers should look for partners who can ‘put them in touch with counterparts using their platform and can share best practices and insights into this key workflow’ and ‘offer a robust/scalable yet deeply customisable product, as well as quality measurement tools, to be able to adapt the platform to their specific use cases and tune the way it behaves to meet acceptance criteria.’

In addition, he said, semantic enrichment partners should be able to provide support to publishers’ customers using the platform and be able to recommend a range of implementation partners, who can offer complementary consulting and/or integration services. Finally, he said,

Automation as a result of enrichment and its outputs can make publishing workflows more efficient partners should ‘have off-the-shelf integrations and repeat deployments with other industry- relevant solutions (such as content repositories or workflow tools).’

Silverchair’s Zarnegar advised that publishers should ask to see practical outcomes of semantic projects when evaluating partners. ‘While many semantic technologies and approaches have the

potential to be effective for many uses, publishers should look for a partner that can take semantic enrichment projects from initial concept all the way through to specific business outcomes. The best partners will help the publishers achieve those business goals with their enriched content rather than just enriching the content and moving on,’ he explained.

Future potential

There are many potential opportunities for semantic enrichment to enhance content further in the future too.

Hastings predicted a range of potential applications, including enterprise search and linked data, where data from disparate text sources is connected by applying semantic identifiers. He also expects to see more semantic stores – generating ‘triples’ of information (entity- relationship-entity) from text and storing it in a semantically meaningful way so that the data can be connected together in networks – and geospatial search (searching for concepts within maps).

Another opportunity is in linking chemistry to biology, for example, the ability to link entity names with identifiers and property information such as SMILES, structures, HELM notation and sequence data. Hastings also sees greater potential in healthcare applications, for example better annotation and search of electronic health records.

Healthcare information is a particular focus for Silverchair, and Zarnegar described how semantic  enrichment has helped one publisher in this area. ‘We recently worked with McGraw-Hill for more than two years to enrich their medical content and develop ClinicalAccess (www. clinicalaccess.com), a clinical decision support system designed to get medical professionals quickly to a very short snippet of content that answers their questions.

‘The deep semantic tagging of their content, combined with our medical thesaurus filled with hospital jargon, synonyms, acronyms, and abbreviations, allow us to handle complex queries and bring back snippets that contain the answer.

‘For example, for a query like “what is the utility of ABG in COPD?” (which was an actual query from user testing) we return a four-sentence snippet that says that in general the ABG test is not useful in the evaluation of COPD unless certain conditions are met (and then the snippet lists the conditions). Four sentences! Nothing more, nothing less. This is a huge advance from previous systems that would return a 20-page chapter on COPD and require a user to search within it to find their answer (what we call the “secondary search”). I see a future where semantic systems in medicine continue to get closer and closer to actually answering queries rather than just returning search results or large blocks of text.’

More generally, Zarnegar said he is seeing ‘two big trends with semantic applications that might seem to be contradictory on first blush, but are actually complementary. That is, semantic applications are tackling both expansive and reductive information problems.’

He explained: ‘One on hand, semantic enrichment allows large-scale content crawlers to understand more about the content they are crawling (mostly through the RDF framework)

There are many potential opportunities for semantic enrichment to enhance content further in future and use that knowledge to find significant connections that may have never been noticed by a human before. Some of the most exciting science taking place today is in translational areas that combine the heretofore separated findings of different disciplines into new knowledge. If discipline content is semantically tagged, computers have a much better chance at finding those potential connections and surfacing them for researchers to explore. I consider this expansive because it would be surfacing relevant content to the right people that they may never have found or considered without semantic connections.’

Mayer also summed up his predictions: ‘Today semantic enrichment is often used to provide a more compelling experience accessing content. In the future, we believe it will play a fundamental role in helping publishers organise knowledge, restructure the way it is packaged and consumed, placing the end-user and their preoccupations at the centre of the equation, and positioning themselves more as service providers rather than content providers.

‘Topic Pages are an example of how semantic enrichment helps build focused, thematic products that address particular needs of the end-user at a particular point in their workflow, for example helping them save time by accessing specific information when they need it, and actually helping them avoid “having to read” all of the corresponding literature.

‘In the future, items of knowledge currently “hidden in plain view” in literature can be extracted, aggregated and delivered as the core of publisher products, and the content itself can continue to play an essential, but not the only, role, in their offerings,’ he concluded. 

Some semantic enrichment options

Silverchair offers semantic enrichment project services and tools. These include strategic semantic planning services and a taxonomy/ontology manager (Totem). The company also provides taxonomy development services, an automated semantic enrichment engine that tags content down to the most granular level (Tagmaster) and a web delivery platform that uses semantically enriched content to drive advanced features and new product creation (SCM6). It also provides an analytics platform that combines user activity with semantic tagging to create detailed business intelligence about audiences and their information preferences (Silvermine).

Linguamatics offers a natural language processing (NLP)-based text analytics platform called I2E, which identifies and captures semantically relevant information buried in unstructured text. This platform allows you to plug in domain knowledge – ontologies, terminologies and thesauri – when processing and tagging the text. Phil Hastings of Linguamatics said: ‘This means that rather than just searching for keywords the user can search for concepts and classes of entity and the relationships between them.’

I2E operates in two ways. The first option is to run NLP-based queries directly and extract information in a structured form for further analysis and review by an end user, or for export to a knowledge base. I2E can also pass tagged and enriched documents to a search engine to improve the end user experience and make the relevant concepts more discoverable.

TEMIS provides its Luxid Content Enrichment Platform, which extracts structured information from unstructured content by recognising the key topics, entities and relations mentioned in text that can then enrich document metadata.

The new version 7 of Luxid includes Webstudio, an ontology management web application that enables users to create, edit and maintain their ontology collaboratively, while governing the way ontological objects are recognised by the Luxid semantic enrichment pipeline. It uses the platform’s NLP layer to preview in real time the results of the semantic enrichment process when applied to users’ corpus of documents. It is also able to suggest relevant objects mentioned in the user’s corpus that are not yet included in the ontology, helping users to improve their existing ontology or build one from scratch.