The importance of concepts

20 November 2018

Harnessing content management is the next generation in technology for publishing, writes Manisha Bolina

You might be a publisher if you’ve asked yourself or your colleagues any of the following questions:

How can we enrich content?
What precisely is in this resource?
How can we generate more awareness of our content?
How can we increase sales?
How can we more deeply engage visitors to our website?
How can we improve our data-feeds to distributors?
How can we do all this quickly while ensuring an unbiased approach?

Discoverability of content, taxonomies and metadata have always been among the most discussed issues in the publishing industry. We are now living in an era where content is exploding. During Open Access week a Twitter post pointed out that OA content had increased by 10 per cent in just one year. The growth of scientific output is somewhere between eight and nine per cent a year. As data grows exponentially, finding a way to manage and make use of it becomes increasingly critical.

In search of solutions, publishers are increasingly considering how artificial intelligence technology can streamline processes and answer some of the questions they are challenged with when it comes to managing and marketing an ever-growing repository of content.

Specifically, natural language processing (NLP) and machine learning have aided content enrichment and taxonomy improvements, leading publishers to improved workflow efficiencies, thus saving significant personnel costs.

NLP also provides an unbiased way to get a more granular understanding of publishers’ content corpus. However, these solutions still fall a bit short. Put simply, NLP hunts for words to generate a form or taxonomy for publishers using keywords. However, classical NLP approaches cannot easily disambiguate words having multiple meanings, especially at the single-word level.

The next generation of AI technology is going one step further by mimicking how the human mind works and applying those cognitive functions to get a better handle on vast data sets. The human mind is an ideal model because it is used to consuming large amounts of information from various sources; it automatically creates neural pathways connecting relationships with things it already knows and creating new pathways as humans learn more things.

Nestled in Silicon Valley, Yewno, has been leading this charge. With a PhD from Kings College in London, renowned entrepreneur Ruggero Gramatica and his team of gifted data scientists have developed algorithms which use AI technology to mimic the way the human mind works. Their models leverage machine learning, computational linguistics, and graph theory to collectively accomplish two very important objectives:

• Identify and extract concepts from both structured and unstructured information; and
• Unearth significant knowledge via an inferential chain of connections between identified concepts.

Yewno’s technology does this by ingesting the full text of a publisher’s content corpus in order to read and understand the text the same way a human does. It then extracts the context, and “understands” its meaning in the form of a concept, in a totally unbiased way.

The major difference between Yewno’s technology and NLP lies in ‘semantic spaces. Semantic spaces in the natural language domain aim to create representations of natural language that are capable of capturing meaning. The original motivation for semantic spaces stems from two core challenges of natural language: vocabulary mismatch and ambiguity of natural language.

Rules-based and model-based approaches like NLP which operate on a keyword level cannot take into account polysemy and synonymy. Polysemy means the coexistence of many possible meanings for a word or phrase such as jaguar or java, while synonymy refers to different terms with equivalent meanings, such as the way ‘valid’ can mean authorised, legitimate, or licit. Yewno’s technology enables the application of semantic spaces, overcoming these limitations.

While these nuances seem highly technical, they become particularly important when reading large amounts of text, such as a publishers’ entire collection. To overcome difficulty with ambiguous terms, Yewno embeds the multitude of concepts into a semantic space whereby semantically related concepts are closely grouped together. Yewno hunts for concepts, not keywords, and identifies these as objects that carry a description and a significance.

This powerful artificial intelligence technology led to the creation of Yewno Unearth, a product that empowers data management for publishers. By answering all the questions put forth at the start of this article, Yewno Unearth enables publishers to a more granular understanding of their content at topic, subtopic and concept level. As information expands, publishers are better placed than ever to take control of their data by harnessing the power of next-generation AI technology.

Manisha Bolina is channel partner manager, UK and Europe, for Yewno