Trekking into the semantic frontier

Share this on social media:

Image: Rob Lavers LRPS/

Jonathan Bresman explains why the Starship Enterprise and Jurassic Park needed semantic search

A discovery service isn’t really discovering if it doesn’t comprehend what it is looking for. 

If it functions primarily by brute keyword searches, then it isn’t capable of distinguishing between meaning, context and nuance, and can’t truly understand the user’s intent. If it just presents the user with page after page of results in which the keyword simply makes an appearance, then it really isn’t any better than dumping a haystack on the user and leaving it to them to find the needle. Now, is searching for a needle in a haystack better than having to search for it out on the prairie? Sure, but it’s still not good enough.

What is needed instead is the semantic enrichment of the content that is indexed in the discovery service – ‘smart’ subject indexing is an example. Essentially, metadata that provides every knowledge item with its frame of reference, definition and significance, as well as information about its connections and relationships with other knowledge items.  

Once a body of content is infused with such semantic metadata, an ideal discovery service designed for semantic search is then capable of understanding the meanings of words, comprehends the sentences in which the words are strung together, has a sufficiently expansive knowledge graph to understand if the sentence is referencing a concept, and if that concept, is, in turn, part of a larger mental model.

This, however, is easier said than done. There is even an episode of Star Trek: The Next Generation built around the ship’s computer having difficulty with it. Essentially, the Enterprise encounters an alien race called the Tamarians, and while the ship’s universal translator can interpret individual Tamarian words, their sentences don’t make any sense. The problem is that the Tamarians speak entirely in cultural references, and since the Enterprise computer is not familiar with Tamarian history and literature, it can’t convey the Tamarians’ intended meaning. (Basically, it is like someone unfamiliar with Shakespeare and the Bible not understanding that Romeo and Juliet is shorthand for doomed love, mentioning Cain and Able is shorthand for murderous brotherly envy, etc.)

Semantic understanding is more than just comprehending context, however. It is also understanding overall systems, the system’s subsets, and which combination of the system’s components might overlap with the user’s needs. For example, in Jurassic Park, Lex, the young heroine, desperately needs to find out how to lock a door to keep out a hungry dinosaur. She wastes precious time having to look through different clusters of files. If the Jurassic Park computer system was capable of semantic search, Lex could have done a search for securing the specific door she needed.

Another way to think of this is that semantic search ideally allows a discovery service to be capable of making inferences and reading between the lines. Furthermore, it should understand the subjects it indexes well enough to recognise where there may be overlap between them. Metaphorically speaking, it would be able to recognise where the area of overlap is in a Venn diagram. The system should then present these ‘overlap’ areas to subject matter experts, who can recognise the significance of this overlap and tag it with appropriate metadata, thus training the discovery service still further.

For example, a discovery service capable of semantic search understands that ‘prednisone’ is an anti-inflammatory steroid. It also comprehends that ‘acne’ is a skin condition. But beyond that, it is able to identify a set of articles in which ‘prednisone’ and ‘acne’ overlap. Subject matter experts would see these and recognise that the reason for the overlap between is that acne is a side effect of prednisone. The subject matter expert then would add this knowledge to the semantic enrichment, and going forward, the discovery service would now ‘understand’ both explicit and implicit relationships involving prednisone. The explicit relationship is was what it already knew – that prednisone is an anti-inflammatory steroid. But now it also ‘knows’ that one of prednisone’s potential side effects is acne — an implicit connection.

A system that instead replies on brute keyword searches would simply have provided list after list of results for prednisone and list after list for acne. It would not have presented the articles where they overlap. The user would have had to manually go through the raw output of all these countless articles and track by hand all the ones in which both were mentioned in order to figure out which ones were truly needed. And even if the user or a subject matter expert noted that acne was a side effect of prednisone, there would be no way to ‘teach’ it to the discovery service. Essentially, any future user who needed the same information would have to go through the same tedious process.

Semantic enrichment can also help a discovery service recognise implications. For example, if it ‘sees’ articles about over-the-air updates of autonomous vehicles and recognises article content about autonomous vehicles being hacked, the discovery service can ‘know’ to bring it up in searches for cybersecurity, even if the word ‘cybersecurity’ is not mentioned in the article. A resource that relies on brute force keyword searches would miss these articles entirely since it would not see the word ‘cybersecurity’ in them.

Now, while we talk about computers ‘knowing’ and ‘understanding’, there is a reason we put those words in quotes. While they are good at calculations and tracking patterns, it is important that the ideal discovery service also has an active human team monitoring and refining it, continuously ‘teaching’ what connections are significant, which aren’t, and so on. 

For example, a discovery service capable of semantic search might think that because reviews of Home Alone 2 mention ‘New York’ that there is some significant connection between Home Alone 2 and New York State. Essentially, the discovery service might not realise that when people say ‘New York’ they often tend to mean ‘New York City’, and a human would have to correct this, making it clear that the ‘New York’ mentioned in these movie reviews is indeed ‘New York City’, and that the relationship between Home Alone 2 and New York is that Home Alone 2 takes place in New York City.

While discovery services capable of perfect semantic search aren’t here yet, features like the EBSCO Discovery Service (EDS) Concept Maps are a great step forward. Concept maps are the visual representation of knowledge graphs, which leverage subject indexing to deliver precise results from across all the records of a discovery service. In such systems, subject indexes are mapped across vocabularies and natural language terms, constituting an extensive semantic network of related terms and topics. Since concept maps essentially show users knowledge graphs, they make it easier for users make connections across related topics.

Most significantly, users can find hidden relationships between and among concepts, and discover links across fields of study, which has the benefit of making things like interdisciplinary research easier. Since, as mentioned above, concept maps add a semantic layer to subject queries, this facilitates the use of more natural language in users’ searches, which means that users can potentially learn and discover more as they are not constricted to the use of a rigid, inflexible ‘library speak’ that they might not be proficient in and can instead search using their own words.

While the science fiction computers of the Starship Enterprise and Jurassic Park had limitations when it came to semantic enrichment and semantic search, EDS is already trekking into the semantic frontier. Granted, it isn’t confronting potential interstellar conflict or dinosaur attacks, but it is making significant changes and advances to how research is done. 

The pace in which it is developing may not be quite at warp speed, but the leaps forward are coming faster and faster, and the benefit to researchers is that with each new breakthrough in semantic search, researchers can go ever more boldly into their research in ways they have never gone before. 

Jonathan Bresman is innovation editor and market outreach lead at EBSCO Information Services