Tech focus: Enhancing and augmenting information assets

10 August 2022

Share this on social media:

Semantic enrichment is the enhancement of content with information about its meaning. It augments the amount of information carried by specific words or composition of words, thereby enhancing its value by making it easily discoverable and relatable to other data sets or information assets.

In research communication semantic enrichment can, on the one hand, mean the design or packaging of content to increase human or machine comprehension, but it can also mean the augmentation, association with, or the embedding of additional content in a format other than text – such as an infographic, video explanation, or other form of data visualisation. Semantic enrichment strategically brings focus to the main message of the content, makes specific content stand out above the rest of the narrative, and enhances its discoverability – through ‘human’ or ‘machine’ information processing.

Recent key developments

Advances in natural language processing (NLP) through machine learning (ML) is the major key development that has had, and will continue to have, significant impact on both the production and consumption of semantically enriched articles.

For example, in life sciences, text mining has become an important tool for researchers and the most fundamental task is the recognition of biomedical named entities, such as proteins, species, chemicals, genes, diseases and so on. The ability to automatically develop effective word embeddings for biomedical literature has substantially enhanced text-mining in that area.

Word embeddings capture semantic similarities between words that are not visible from their lexicographic form; for instance, the words ‘enables’ and ‘allows’ are syntactically very different, yet their meaning is somewhat related.

Challenges of scale, and responses to it, are the most important theme of the last decade. On any given topic, there is simply too much information available in myriad formats and modalities for a person to digest. ML, algorithmic personalisation, the rise and normalisation of Wikipedia, Google’s ‘featured snippets,’ and Netflix burying search in favour of browsable ‘taste clusters’ all attempt to mediate discovery at scale and account for this. Semantic enrichment leans on these algorithmic solutions to attend to the scale problem. These solutions lean on semantic enrichment to isolate content into trusted categories, enabling ML to extrapolate and identify similar material.

The movement toward graphic and video representation of data is becoming more pronounced. Article and page design is being adjusted to accommodate infographics and video in association with, as well as embedded within, the article. While ML is likely the next phase – requiring a level of sophistication of meta-tagging and word semantic association – for now, the ‘stopping power’ of an article, or of a dataset, is very much the success factor for human comprehension.

Benefits to the scholarly communications industry

These developments are critical to harness the research community’s scholarly output. They are leading towards huge shifts across research domains – from biomedical applications to chemical engineering – making current processes more efficient, but also introducing new workflows and opportunities.

For the publishing community, wide distribution and readership is the goal. For the public, ‘open’ accessibility and discoverability are the needs. For all population segments, semantic enrichment (the visualisation of data through enriched content like infographics and video) addresses the ‘stopping power’ of an article or element of data.

Developments in semantic enrichment allow for faster discovery, easier and more precise searches for literature, and the reuse of published science in novel and unexpected ways. However, for semantically enriched papers to really be useful, publishers should consider providing the right application programming interfaces (APIs) to developers to ensure papers are accessible beyond the publisher’s site – not just by dominant search engines, but by researchers and developers alike.

Copyright Clearance Center

RightFind Navigate and RightFind Enterprise together accelerate research by unifying information and data discovery and supporting copyright–compliant collaboration.

RightFind Navigate with Semantic Search helps researchers identify relevant concepts through an expanded search that incorporates over 20 million synonyms from SciBite’s biomedical vocabularies and semantically enriches search results from indexed and API-based data sources in real time.

RightFind Navigate enables researchers to find relevant content through contextualised discovery based on machine learning and smart data. Designed to streamline access to research information, RightFind Navigate unifies searching across multiple licensed content sources, publicly available data and internal proprietary content, empowering researchers to reveal connections and drive innovation. The solution provides a flexible, scalable, open ecosystem designed to maximise organisations’ return on their content and data investments.

With expertise in copyright and information management, CCC and its subsidiary RightsDirect design and deliver information solutions that power decision-making by helping people integrate and navigate data sources and content assets.

Find more information: https://bit.ly/3MIiXDk

tech focus

Publishing news

Tech focus

Popular

Latest issue

Tech focus: Enhancing and augmenting information assets

Recent key developments

Benefits to the scholarly communications industry

Copyright Clearance Center

Latest issue