A semantic approach

Share this on social media:

TEMIS has recently launched the latest version of its Luxid product for semantic enrichment. Eric Brégand, chief executive officer of the company, tells us how this can help publishers

What does TEMIS do? TEMIS aims to help publishers and large corporations deal with large volumes of information by semantically enriching their textual content. This can make the content more compelling and promote usage through features such as efficient search, faceted navigation, content linking and search engine optimisation. It can also help develop innovative products and formats. Our technology is embedded in many publishing workflows, particularly in the STM, B2B and legal sectors.

Starting with a raw electronic document, our platform processes each sentence to understand its structure and meaning, with the goal of extracting all the relevant concepts mentioned. The concepts we recognise are typically the topics or entities mentioned in the document, but we can also detect their relationships. Side-effects are an example of relationships where we recognise explicit links between a drug, its administration method, the patient to whom it was administered and the symptom created as a result.

Once entities have been identified, we link them to knowledge databases where structured information about these entities can be found. We are also able to link towards related topics or similar documents.

We also enable content analytics, the large-scale exploration of large document sets. This allows users to piece together dispersed items of information and detect hidden facts or relationships. If ‘Eskimos eat a lot of raw fish’ is mentioned in a document, and ‘Eskimos have low cholesterol levels’ is mentioned in another, one could consider establishing a relationship between the two facts.

What are the challenges?

A key challenge is to provide users with domain-specific annotation. When you work for LexisNexis you have to understand legal concepts. For McGraw-Hill you have to deal with aviation or construction, while Nature Publishing Group focuses on areas like medicine and biology. We need to provide relevant annotations across all these domains. A ‘52 year old man’ in an article could be a patient, a suspect or something else depending on the domain of the article and the publisher. With version 6, our Luxid Content Enrichment Platform has become increasingly easy to use across a very broad range of domains, while at the same time requiring less customer expertise in natural language processing itself.

As an example, our platform can use an existing taxonomy of concepts to recognise variants of taxonomical terms in documents. It can grade their relevance based on various factors such as their positions in the text, number of occurrences, or depth in the taxonomy. Going further, it is also capable of applying advanced ‘business rules’ formulated by domain experts to disambiguate among alternative meanings of a given term, or of ‘guessing’ what may be important beyond the existing taxonomy.

How is this area changing?

There are changes in the field, in particular new sources of content to process, new user expectations and business model evolutions.

A major trend is the proliferation of social media. User-generated content – from sources like Twitter and the comments and ratings facility on Amazon – is becoming a genuine alternative to traditional media. Open data and open access are also feeding the frenzy. Thankfully, cloud computing makes it easy to interpret content quickly.

The first challenge to publishers is to cope with this load, to package relevant content for the relevant audience and to provide access to it for end users. While their initial goal was often to boost usage, driving customer retention and growth, this has also shifted to providing new types of products and dealing with new customer expectations of quicker information access, snackable formats, and mobile delivery.

Changing business models are bringing both challenges and opportunities to publishers. In some areas, expectations for free access have become mainstream, while in others transactional models are increasingly replacing subscription models. ‘Author pays’ is turning the value chain upside down in significant pockets of STM and legal publishing.

What does the future hold?

Delivery channels are opening that will enable publishers to extend their reach. Semantic content enrichment enables the combination of publicly-available unstructured information with structured corporate assets. Account managers will benefit from finding the relevant business news associated with their accounts (from press releases for example) directly embedded in their CRM.

We are witnessing mainstream adoption of semantic content enrichment in publishing (in 2011 Publishing Research Consortium found that 46 per cent of scientific journal publishers semantically enrich their content). We expect our technology to serve many more functional layers, such as document authoring and peer-review processes, offering data curation and validation tools. But we also expect this technology will spread fast to the enterprise in the larger sense, as the missing link between the various information silos and the search-based applications built on top of these.

Interview by Siân Harris