Use Semantic Language Tools to Better Understand User Intentions

Use Semantic Language Tools to Better Understand User Intentions

y previous article showed you how to improve your applications’ sophistication by using the power of ontologies to tap into conceptual information about a domain. This article discusses how to supplement applications by providing them with a rudimentary understanding of English vocabulary.

Software applications have many uses for recognizing vocabulary, ranging from spell-checking to providing alternative suggestions for search criteria?the scenario explored in this article. You will see how to create a storefront application that provides free-form search access into the store’s inventory. Because the goal of any store is to make money, this store leverages a lexical understanding of the user’s search to return the greatest number of relevant results. This storefront will, for example, try to provide relevant results for the intended search even when users inadvertently mistype their search criteria. Figure 1 shows a simple example.

?
Figure 1. Storefront Responding to a Misspelled Word: Even though the user misspelled “pants” in the search box the storefront was still able to present meaningful results back to the user.

One way to create this type of behavior is to use a lexicon such as WordNet to search and navigate words and their meanings. WordNet is an English-language lexicon (you can think of this as a dictionary) developed at Princeton University and funded largely by government grants. Words in WordNet that share a common meaning?synonyms?are organized into groups called synsets. Additionally, WordNet defines relationships between synsets to capture the semantic relationships between words. An example of a relationship in WordNet is that of antonyms, words that have opposite meanings from one another. Later in this article you will be exposed to other types of relationships supported in WordNet and to a couple of techniques for getting programmatic access to WordNet’s relationship information.

?
Figure 2. OWL Representation of WordNet: This figure illustrates the concepts in WordNet and their relationships to each other (Source: Wordnet in RDFS and OWL)

In a way, you can think of WordNet as a lexical ontology, a conceptualization of the entities and relationships of things in the domain of language. In fact, the W3C has provided a representation of WordNet in the Web Ontology Language (OWL). The ontological perspective into WordNet is interesting for many reasons, not the least of which is to help describe the structure of WordNet and simplify visualization of its concepts and their relationships. Figure 2 illustrates these ideas from the W3C document that describes the OWL representation of WordNet.

As Figure 2 shows, each lexical expression of a concept will likely map to different words in different languages. You can think of the lexical form of a word as the series of letters that represent the word in a particular language. Each word might have different possible meanings; for example, “pants” might mean an article of clothing, or could refer to the heavy breathing of a dog. The WordSense concept expresses such multiple meanings. Lastly, related WordSenses are grouped by synsets as defined earlier.

Searching for Synonyms
Enough background?let’s see how you might put WordNet to use. Let’s assume that our storefront has an inventory that includes “pants.” If a user searches for synonymous words, such as “trousers,” we want the “pants” item from inventory to show up in the results. This could be expressed in a unit test as follows:

   // src/test/java/com/devx/storefront/StorefrontTest.java   ...      @Test      public void testSearch_ExactNotFound_SynonymnFound()      {         store.addItem(new Item("pants"));            Set matchingItems = store.search("trouser");         assertEquals(1, matchingItems.size());            assertTrue(matchingItems.contains(new Item("pants")));      }   ...

The preceding test verifies that the storefront includes synonymous words in the results presented back to the user.

There are several Java APIs for WordNet that you could use to implement this unit test; this article will explore two: Java WordNet Library (JWNL) and Jawbone. Here’s a JWNL example in which the storefront delegates the synonym search to a JWNL dictionary class:

   // src/main/java/com/devx/storefront/Storefront.java   ...      public Set search(String name)      {         Set matchingItems = new HashSet();         ...         // Then add other similiar words in inventory         Set candidateWords = new HashSet();         ...                      candidateWords.addAll(jwnlDictionary.lookupSynonyms(name));         ...         return matchWordsToInventory(candidateWords);      }   ...

The interesting logic regarding WordNet really occurs in the JwnlDictionary class:

   src/main/java/com/devx/storefront/JwnlDictionary.java   ...      public Set lookupSynonyms(String lexicalForm)      {         Set synonyms = new HashSet();            IndexWord indexWord = dictionary.getIndexWord(            POS.NOUN, lexicalForm);         if (indexWord == null)            return synonyms;         Synset[] synSets = indexWord.getSenses();         for (Synset synset : synSets)         {            Word[] words = synset.getWords();            for (Word word : words)            {               synonyms.add(word.getLemma());            }         }         return synonyms;      }   ...
?
Figure 3. Searching by synonyms: As shown in this figure, results for searches for “trousers” include “pants” as a synonym of “trousers.”

The preceding code snippet extracts all possible lexical representations of all possible meanings of the word passed in as a method parameter. These various lexical forms are matched to items in inventory by the Storefront object. To step back momentarily, the user’s search generates a set of related word forms that are compared to the store’s inventory, and any matching results are returned. Thus a search for “trousers” returns a match for the inventory item “pants” as shown in Figure 3.

Searching for Hypernyms and Hyponyms
The inclusion of synonymous words in the search results is nice, but there are several other types of searches you could perform that would return other possibly relevant items in the search results. Two other possibilities are hypernyms and hyponyms. Hypernyms include more general terms while hyponyms include more specific terms. For example, “garment” is a more general term than “pants” and thus is a hypernym to “pants.” On the other hand, “jeans” is a more specific term and thus is a hyponym. The two unit tests below express this behavior:

   // src/test/java/com/devx/storefront/StorefrontTest.java   ...      @Test      public void testSearch_ExactNotFound_HyponymFound()      {         store.addItem(new Item("pants"));            Set matchingItems = store.search("jeans");         assertEquals(1, matchingItems.size());            assertTrue(matchingItems.contains(new Item("pants")));      }         @Test      public void testSearch_ExactNotFound_HyponymFound_Levis()      {         store.addItem(new Item("jeans"));            Set matchingItems = store.search("levis");         assertEquals(1, matchingItems.size());            assertTrue(matchingItems.contains(new Item("jeans")));      }   ...

Here are the corresponding JWNL implementations:

   // src/main/java/com/devx/storefront/JwnlDictionary.java   ...      public Set lookupHypernyms(String lexicalForm)      {         return lookupWordsFollowingPointer(            lexicalForm, PointerType.HYPERNYM);      }         public Set lookupHyponyms(String lexicalForm)      {         return lookupWordsFollowingPointer(            lexicalForm, PointerType.HYPONYM);      }         private Set lookupWordsFollowingPointer(         String lexicalForm, PointerType pointerType)      {         Set hypernyms = new HashSet();            IndexWord indexWord = dictionary.getIndexWord(            POS.NOUN, lexicalForm);         if (indexWord == null)            return hypernyms;         Synset[] synSets = indexWord.getSenses();         for (Synset synset : synSets)         {            if (hasPointer(synset, pointerType))            {               PointerTarget[] targets =                   synset.getTargets(pointerType);               for (PointerTarget target : targets)               {                  Word[] words = ((Synset) target).getWords();                  for (Word word : words)                  {                     hypernyms.add(word.getLemma());                  }               }            }         }         return hypernyms;      }   ...

This code is very similar to the code for synonyms, with an additional operation to navigate either a hypernyms or hyponyms pointer from inclusion in search results. Figure 4 shows an example of a hypernym search, while Figure 5 shows hyponym search results.

?
Figure 4. Searching By Hypernym: Because the word “garment” is a hypernym encompassing various types of clothing, this search returns all more specific forms of clothing contained in inventory.
?
Figure 5. Searching by Hyponym: Because “levis” is a hyponym, or a specific type, of “jeans,” this search result includes “jeans” in the results list.

Finding Lexically and Phonetically Similar Words
So far we have assumed that user searches have been well formed, even if they do not directly match inventoried items. But as with any free-form input method, user searches in the storefront application might be misspelled. A misspelling could be due to an accidental transposing of characters or could result from a user trying to phonetically spell a word by sounding it out. Next you will see how to enhance the storefront application to be forgiving in such cases of user error.

To do so you will need to familiarize yourself with another Java WordNet tool, Jawbone. JWNL is well-suited for doing precise types of searches and for navigating the resulting data structures. Unfortunately, JWNL is not so good at considering non-exact searches, and that is where Jawbone comes into the picture. Jawbone searches are filter based, and the algorithm for filtering results is pluggable and highly extensible. This makes Jawbone good at searching for misspelled words, but it’s also less efficient than JWNL. Both tools have their own strengths and weaknesses, as highlighted by the various scenarios portrayed in this article.

To search for lexically similar words in Jawbone you will use the SimilarFilter. The SimilarFilter uses the Levenshtein distance between two words to gauge their lexical similarity. Without going into too much detail, it is sufficient to summarize the Levenshtein distance as a score that represents the number of operations required to translate one word to another. This algorithm picks up the transposition, addition, or omission of characters in a word.

Similar to searches for lexically similar words in Jawbone, phonetic searches are also performed via a filter. The filter for phonetic searches is the SoundFilter and it functions by assigning words an index that approximates the sounds produced by the consonants in the words. The algorithm for computing this index is known as the Soundex algorithm.

Test cases for defining the functioning of lexically and phonetically similar searches in the storefront application are shown below:

   // src/test/java/com/devx/storefront/StorefrontTest.java   ...      @Test      public void testSearch_ExactNotFound_LexicallySimiliarFound()      {         store.addItem(new Item("pants"));            Set matchingItems = store.search("pbnts");         assertEquals(1, matchingItems.size());            assertTrue(matchingItems.contains(new Item("pants")));      }         @Test      public void          testSearch_ExactNotFound_PhoneticallySimiliarFound()      {         store.addItem(new Item("trouser"));            Set matchingItems = store.search("trouzer");         assertEquals(1, matchingItems.size());            assertTrue(matchingItems.contains(new Item("trouser")));      }   ...

As in the previous examples, the Storefront class delegates the more interesting lexical operations to a Jawbone-based dictionary. The pertinent methods are shown below:

   // src/main/java/com/devx/storefront/JawboneDictionary.java   ...      public Set lookupLexicallySimiliarWords(         String lexicalForm)      {         return searchTermsAndPackageTerms(new             SimilarFilter(lexicalForm, true, 2), lexicalForm);      }         public Set lookupPhoneticallySimiliarWords(         String lexicalForm)      {         return searchTermsAndPackageTerms(new             SoundFilter(lexicalForm, true), lexicalForm);      }         private Set searchTermsAndPackageTerms(         TermFilter filter, String lexicalForm)      {         Set words = new HashSet();         Iterator termIterator =             dictionary.getIndexTermIterator(100, filter);         while (termIterator.hasNext())         {            IndexTerm term = termIterator.next();               if (!lexicalForm.equals(term.getLemma()))            {               words.add(term.getLemma());            }         }         return words;      }   ...

As you can see in the preceding code, lexical and phonetical filtering are similar operations in Jawbone. In fact, one of the strengths of Jawbone is the ease with which you can substitute additional filtering strategies. But the trade-off is that these filter-based searches are significantly slower than the index-based searches of JWNL.

Figure 1 earlier in the article demonstrated lexical searches performed in the Storefront application.

Morphology
The last few examples illustrate how to create a much more sophisticated search engine that’s more forgiving in response to user searches. All the searches so far have been in the plural form, but shouldn’t a search for “pant” return the inventoried item “pants?” Of course?and to accomplish this you will need to familiarize yourself with the concept of morphology. As the word implies, it has to do with the forms that words can take. For example, a noun can have a plural form and a verb can have a past-tense form. Morphology processors differ in their level of sophistication. The simple Storefront application uses JWNL’s default morphology processor to convert words into their root form before performing the search.

Using JWNL this requires only a simple extension to the examples illustrated earlier as the following code fragment illustrates:

   // src/main/java/com/devx/storefront/JwnlDictionary.java   ...      public Set lookupMorphologicallySimilarLexicalForms(         String lexicalForm)      {         Set forms = new HashSet();            List baseForms = dictionary.getMorphologicalProcessor().            lookupAllBaseForms(POS.NOUN, lexicalForm);         for (Object baseForm : baseForms)         {            forms.add(baseForm.toString());         }            return forms;      }   ...

Notice the calls to retrieve the morphologicalProcessor from the dictionary and to lookupAllBaseForms to identify all possible root forms of the search term.

Throughout this article you have been exposed to several methods and tools for performing lexical functions on free-form user input. I hope that this has demonstrated how easy it is to take advantage of the conceptualization of the English language provided by WordNet and has given you some ideas for implementing these tools in your own applications. A lexical understanding of input is of course only the tip of the iceberg in terms of the types of analysis that can be performed within the broader category of natural language processing, but as you have seen just a few simple techniques can dramatically improve the ability of your applications to interpret user input.

devx-admin

devx-admin

Share the Post:
USA Companies

Top Software Development Companies in USA

Navigating the tech landscape to find the right partner is crucial yet challenging. This article offers a comparative glimpse into the top software development companies

Software Development

Top Software Development Companies

Looking for the best in software development? Our list of Top Software Development Companies is your gateway to finding the right tech partner. Dive in

India Web Development

Top Web Development Companies in India

In the digital race, the right web development partner is your winning edge. Dive into our curated list of top web development companies in India,

USA Web Development

Top Web Development Companies in USA

Looking for the best web development companies in the USA? We’ve got you covered! Check out our top 10 picks to find the right partner

Clean Energy Adoption

Inside Michigan’s Clean Energy Revolution

Democratic state legislators in Michigan continue to discuss and debate clean energy legislation in the hopes of establishing a comprehensive clean energy strategy for the

Chips Act Revolution

European Chips Act: What is it?

In response to the intensifying worldwide technology competition, Europe has unveiled the long-awaited European Chips Act. This daring legislative proposal aims to fortify Europe’s semiconductor

USA Companies

Top Software Development Companies in USA

Navigating the tech landscape to find the right partner is crucial yet challenging. This article offers a comparative glimpse into the top software development companies in the USA. Through a

Software Development

Top Software Development Companies

Looking for the best in software development? Our list of Top Software Development Companies is your gateway to finding the right tech partner. Dive in and explore the leaders in

India Web Development

Top Web Development Companies in India

In the digital race, the right web development partner is your winning edge. Dive into our curated list of top web development companies in India, and kickstart your journey to

USA Web Development

Top Web Development Companies in USA

Looking for the best web development companies in the USA? We’ve got you covered! Check out our top 10 picks to find the right partner for your online project. Your

Clean Energy Adoption

Inside Michigan’s Clean Energy Revolution

Democratic state legislators in Michigan continue to discuss and debate clean energy legislation in the hopes of establishing a comprehensive clean energy strategy for the state. A Senate committee meeting

Chips Act Revolution

European Chips Act: What is it?

In response to the intensifying worldwide technology competition, Europe has unveiled the long-awaited European Chips Act. This daring legislative proposal aims to fortify Europe’s semiconductor supply chain and enhance its

Revolutionized Low-Code

You Should Use Low-Code Platforms for Apps

As the demand for rapid software development increases, low-code platforms have emerged as a popular choice among developers for their ability to build applications with minimal coding. These platforms not

Cybersecurity Strategy

Five Powerful Strategies to Bolster Your Cybersecurity

In today’s increasingly digital landscape, businesses of all sizes must prioritize cyber security measures to defend against potential dangers. Cyber security professionals suggest five simple technological strategies to help companies

Global Layoffs

Tech Layoffs Are Getting Worse Globally

Since the start of 2023, the global technology sector has experienced a significant rise in layoffs, with over 236,000 workers being let go by 1,019 tech firms, as per data

Huawei Electric Dazzle

Huawei Dazzles with Electric Vehicles and Wireless Earbuds

During a prominent unveiling event, Huawei, the Chinese telecommunications powerhouse, kept quiet about its enigmatic new 5G phone and alleged cutting-edge chip development. Instead, Huawei astounded the audience by presenting

Cybersecurity Banking Revolution

Digital Banking Needs Cybersecurity

The banking, financial, and insurance (BFSI) sectors are pioneers in digital transformation, using web applications and application programming interfaces (APIs) to provide seamless services to customers around the world. Rising

FinTech Leadership

Terry Clune’s Fintech Empire

Over the past 30 years, Terry Clune has built a remarkable business empire, with CluneTech at the helm. The CEO and Founder has successfully created eight fintech firms, attracting renowned

The Role Of AI Within A Web Design Agency?

In the digital age, the role of Artificial Intelligence (AI) in web design is rapidly evolving, transitioning from a futuristic concept to practical tools used in design, coding, content writing

Generative AI Revolution

Is Generative AI the Next Internet?

The increasing demand for Generative AI models has led to a surge in its adoption across diverse sectors, with healthcare, automotive, and financial services being among the top beneficiaries. These

Microsoft Laptop

The New Surface Laptop Studio 2 Is Nuts

The Surface Laptop Studio 2 is a dynamic and robust all-in-one laptop designed for creators and professionals alike. It features a 14.4″ touchscreen and a cutting-edge design that is over

5G Innovations

GPU-Accelerated 5G in Japan

NTT DOCOMO, a global telecommunications giant, is set to break new ground in the industry as it prepares to launch a GPU-accelerated 5G network in Japan. This innovative approach will

AI Ethics

AI Journalism: Balancing Integrity and Innovation

An op-ed, produced using Microsoft’s Bing Chat AI software, recently appeared in the St. Louis Post-Dispatch, discussing the potential concerns surrounding the employment of artificial intelligence (AI) in journalism. These

Savings Extravaganza

Big Deal Days Extravaganza

The highly awaited Big Deal Days event for October 2023 is nearly here, scheduled for the 10th and 11th. Similar to the previous year, this autumn sale has already created

Cisco Splunk Deal

Cisco Splunk Deal Sparks Tech Acquisition Frenzy

Cisco’s recent massive purchase of Splunk, an AI-powered cybersecurity firm, for $28 billion signals a potential boost in tech deals after a year of subdued mergers and acquisitions in the

Iran Drone Expansion

Iran’s Jet-Propelled Drone Reshapes Power Balance

Iran has recently unveiled a jet-propelled variant of its Shahed series drone, marking a significant advancement in the nation’s drone technology. The new drone is poised to reshape the regional

Solar Geoengineering

Did the Overshoot Commission Shoot Down Geoengineering?

The Overshoot Commission has recently released a comprehensive report that discusses the controversial topic of Solar Geoengineering, also known as Solar Radiation Modification (SRM). The Commission’s primary objective is to

Remote Learning

Revolutionizing Remote Learning for Success

School districts are preparing to reveal a substantial technological upgrade designed to significantly improve remote learning experiences for both educators and students amid the ongoing pandemic. This major investment, which

Revolutionary SABERS Transforming

SABERS Batteries Transforming Industries

Scientists John Connell and Yi Lin from NASA’s Solid-state Architecture Batteries for Enhanced Rechargeability and Safety (SABERS) project are working on experimental solid-state battery packs that could dramatically change the

Build a Website

How Much Does It Cost to Build a Website?

Are you wondering how much it costs to build a website? The approximated cost is based on several factors, including which add-ons and platforms you choose. For example, a self-hosted