devxlogo

Use Semantic Language Tools to Better Understand User Intentions

Use Semantic Language Tools to Better Understand User Intentions

y previous article showed you how to improve your applications’ sophistication by using the power of ontologies to tap into conceptual information about a domain. This article discusses how to supplement applications by providing them with a rudimentary understanding of English vocabulary.

Software applications have many uses for recognizing vocabulary, ranging from spell-checking to providing alternative suggestions for search criteria?the scenario explored in this article. You will see how to create a storefront application that provides free-form search access into the store’s inventory. Because the goal of any store is to make money, this store leverages a lexical understanding of the user’s search to return the greatest number of relevant results. This storefront will, for example, try to provide relevant results for the intended search even when users inadvertently mistype their search criteria. Figure 1 shows a simple example.

?
Figure 1. Storefront Responding to a Misspelled Word: Even though the user misspelled “pants” in the search box the storefront was still able to present meaningful results back to the user.

One way to create this type of behavior is to use a lexicon such as WordNet to search and navigate words and their meanings. WordNet is an English-language lexicon (you can think of this as a dictionary) developed at Princeton University and funded largely by government grants. Words in WordNet that share a common meaning?synonyms?are organized into groups called synsets. Additionally, WordNet defines relationships between synsets to capture the semantic relationships between words. An example of a relationship in WordNet is that of antonyms, words that have opposite meanings from one another. Later in this article you will be exposed to other types of relationships supported in WordNet and to a couple of techniques for getting programmatic access to WordNet’s relationship information.

?
Figure 2. OWL Representation of WordNet: This figure illustrates the concepts in WordNet and their relationships to each other (Source: Wordnet in RDFS and OWL)

In a way, you can think of WordNet as a lexical ontology, a conceptualization of the entities and relationships of things in the domain of language. In fact, the W3C has provided a representation of WordNet in the Web Ontology Language (OWL). The ontological perspective into WordNet is interesting for many reasons, not the least of which is to help describe the structure of WordNet and simplify visualization of its concepts and their relationships. Figure 2 illustrates these ideas from the W3C document that describes the OWL representation of WordNet.

As Figure 2 shows, each lexical expression of a concept will likely map to different words in different languages. You can think of the lexical form of a word as the series of letters that represent the word in a particular language. Each word might have different possible meanings; for example, “pants” might mean an article of clothing, or could refer to the heavy breathing of a dog. The WordSense concept expresses such multiple meanings. Lastly, related WordSenses are grouped by synsets as defined earlier.

Searching for Synonyms
Enough background?let’s see how you might put WordNet to use. Let’s assume that our storefront has an inventory that includes “pants.” If a user searches for synonymous words, such as “trousers,” we want the “pants” item from inventory to show up in the results. This could be expressed in a unit test as follows:

   // src/test/java/com/devx/storefront/StorefrontTest.java   ...      @Test      public void testSearch_ExactNotFound_SynonymnFound()      {         store.addItem(new Item("pants"));            Set matchingItems = store.search("trouser");         assertEquals(1, matchingItems.size());            assertTrue(matchingItems.contains(new Item("pants")));      }   ...

The preceding test verifies that the storefront includes synonymous words in the results presented back to the user.

There are several Java APIs for WordNet that you could use to implement this unit test; this article will explore two: Java WordNet Library (JWNL) and Jawbone. Here’s a JWNL example in which the storefront delegates the synonym search to a JWNL dictionary class:

   // src/main/java/com/devx/storefront/Storefront.java   ...      public Set search(String name)      {         Set matchingItems = new HashSet();         ...         // Then add other similiar words in inventory         Set candidateWords = new HashSet();         ...                      candidateWords.addAll(jwnlDictionary.lookupSynonyms(name));         ...         return matchWordsToInventory(candidateWords);      }   ...

The interesting logic regarding WordNet really occurs in the JwnlDictionary class:

   src/main/java/com/devx/storefront/JwnlDictionary.java   ...      public Set lookupSynonyms(String lexicalForm)      {         Set synonyms = new HashSet();            IndexWord indexWord = dictionary.getIndexWord(            POS.NOUN, lexicalForm);         if (indexWord == null)            return synonyms;         Synset[] synSets = indexWord.getSenses();         for (Synset synset : synSets)         {            Word[] words = synset.getWords();            for (Word word : words)            {               synonyms.add(word.getLemma());            }         }         return synonyms;      }   ...
?
Figure 3. Searching by synonyms: As shown in this figure, results for searches for “trousers” include “pants” as a synonym of “trousers.”

The preceding code snippet extracts all possible lexical representations of all possible meanings of the word passed in as a method parameter. These various lexical forms are matched to items in inventory by the Storefront object. To step back momentarily, the user’s search generates a set of related word forms that are compared to the store’s inventory, and any matching results are returned. Thus a search for “trousers” returns a match for the inventory item “pants” as shown in Figure 3.

Searching for Hypernyms and Hyponyms
The inclusion of synonymous words in the search results is nice, but there are several other types of searches you could perform that would return other possibly relevant items in the search results. Two other possibilities are hypernyms and hyponyms. Hypernyms include more general terms while hyponyms include more specific terms. For example, “garment” is a more general term than “pants” and thus is a hypernym to “pants.” On the other hand, “jeans” is a more specific term and thus is a hyponym. The two unit tests below express this behavior:

   // src/test/java/com/devx/storefront/StorefrontTest.java   ...      @Test      public void testSearch_ExactNotFound_HyponymFound()      {         store.addItem(new Item("pants"));            Set matchingItems = store.search("jeans");         assertEquals(1, matchingItems.size());            assertTrue(matchingItems.contains(new Item("pants")));      }         @Test      public void testSearch_ExactNotFound_HyponymFound_Levis()      {         store.addItem(new Item("jeans"));            Set matchingItems = store.search("levis");         assertEquals(1, matchingItems.size());            assertTrue(matchingItems.contains(new Item("jeans")));      }   ...

Here are the corresponding JWNL implementations:

   // src/main/java/com/devx/storefront/JwnlDictionary.java   ...      public Set lookupHypernyms(String lexicalForm)      {         return lookupWordsFollowingPointer(            lexicalForm, PointerType.HYPERNYM);      }         public Set lookupHyponyms(String lexicalForm)      {         return lookupWordsFollowingPointer(            lexicalForm, PointerType.HYPONYM);      }         private Set lookupWordsFollowingPointer(         String lexicalForm, PointerType pointerType)      {         Set hypernyms = new HashSet();            IndexWord indexWord = dictionary.getIndexWord(            POS.NOUN, lexicalForm);         if (indexWord == null)            return hypernyms;         Synset[] synSets = indexWord.getSenses();         for (Synset synset : synSets)         {            if (hasPointer(synset, pointerType))            {               PointerTarget[] targets =                   synset.getTargets(pointerType);               for (PointerTarget target : targets)               {                  Word[] words = ((Synset) target).getWords();                  for (Word word : words)                  {                     hypernyms.add(word.getLemma());                  }               }            }         }         return hypernyms;      }   ...

This code is very similar to the code for synonyms, with an additional operation to navigate either a hypernyms or hyponyms pointer from inclusion in search results. Figure 4 shows an example of a hypernym search, while Figure 5 shows hyponym search results.

?
Figure 4. Searching By Hypernym: Because the word “garment” is a hypernym encompassing various types of clothing, this search returns all more specific forms of clothing contained in inventory.
?
Figure 5. Searching by Hyponym: Because “levis” is a hyponym, or a specific type, of “jeans,” this search result includes “jeans” in the results list.

Finding Lexically and Phonetically Similar Words
So far we have assumed that user searches have been well formed, even if they do not directly match inventoried items. But as with any free-form input method, user searches in the storefront application might be misspelled. A misspelling could be due to an accidental transposing of characters or could result from a user trying to phonetically spell a word by sounding it out. Next you will see how to enhance the storefront application to be forgiving in such cases of user error.

To do so you will need to familiarize yourself with another Java WordNet tool, Jawbone. JWNL is well-suited for doing precise types of searches and for navigating the resulting data structures. Unfortunately, JWNL is not so good at considering non-exact searches, and that is where Jawbone comes into the picture. Jawbone searches are filter based, and the algorithm for filtering results is pluggable and highly extensible. This makes Jawbone good at searching for misspelled words, but it’s also less efficient than JWNL. Both tools have their own strengths and weaknesses, as highlighted by the various scenarios portrayed in this article.

To search for lexically similar words in Jawbone you will use the SimilarFilter. The SimilarFilter uses the Levenshtein distance between two words to gauge their lexical similarity. Without going into too much detail, it is sufficient to summarize the Levenshtein distance as a score that represents the number of operations required to translate one word to another. This algorithm picks up the transposition, addition, or omission of characters in a word.

Similar to searches for lexically similar words in Jawbone, phonetic searches are also performed via a filter. The filter for phonetic searches is the SoundFilter and it functions by assigning words an index that approximates the sounds produced by the consonants in the words. The algorithm for computing this index is known as the Soundex algorithm.

Test cases for defining the functioning of lexically and phonetically similar searches in the storefront application are shown below:

   // src/test/java/com/devx/storefront/StorefrontTest.java   ...      @Test      public void testSearch_ExactNotFound_LexicallySimiliarFound()      {         store.addItem(new Item("pants"));            Set matchingItems = store.search("pbnts");         assertEquals(1, matchingItems.size());            assertTrue(matchingItems.contains(new Item("pants")));      }         @Test      public void          testSearch_ExactNotFound_PhoneticallySimiliarFound()      {         store.addItem(new Item("trouser"));            Set matchingItems = store.search("trouzer");         assertEquals(1, matchingItems.size());            assertTrue(matchingItems.contains(new Item("trouser")));      }   ...

As in the previous examples, the Storefront class delegates the more interesting lexical operations to a Jawbone-based dictionary. The pertinent methods are shown below:

   // src/main/java/com/devx/storefront/JawboneDictionary.java   ...      public Set lookupLexicallySimiliarWords(         String lexicalForm)      {         return searchTermsAndPackageTerms(new             SimilarFilter(lexicalForm, true, 2), lexicalForm);      }         public Set lookupPhoneticallySimiliarWords(         String lexicalForm)      {         return searchTermsAndPackageTerms(new             SoundFilter(lexicalForm, true), lexicalForm);      }         private Set searchTermsAndPackageTerms(         TermFilter filter, String lexicalForm)      {         Set words = new HashSet();         Iterator termIterator =             dictionary.getIndexTermIterator(100, filter);         while (termIterator.hasNext())         {            IndexTerm term = termIterator.next();               if (!lexicalForm.equals(term.getLemma()))            {               words.add(term.getLemma());            }         }         return words;      }   ...

As you can see in the preceding code, lexical and phonetical filtering are similar operations in Jawbone. In fact, one of the strengths of Jawbone is the ease with which you can substitute additional filtering strategies. But the trade-off is that these filter-based searches are significantly slower than the index-based searches of JWNL.

Figure 1 earlier in the article demonstrated lexical searches performed in the Storefront application.

Morphology
The last few examples illustrate how to create a much more sophisticated search engine that’s more forgiving in response to user searches. All the searches so far have been in the plural form, but shouldn’t a search for “pant” return the inventoried item “pants?” Of course?and to accomplish this you will need to familiarize yourself with the concept of morphology. As the word implies, it has to do with the forms that words can take. For example, a noun can have a plural form and a verb can have a past-tense form. Morphology processors differ in their level of sophistication. The simple Storefront application uses JWNL’s default morphology processor to convert words into their root form before performing the search.

Using JWNL this requires only a simple extension to the examples illustrated earlier as the following code fragment illustrates:

   // src/main/java/com/devx/storefront/JwnlDictionary.java   ...      public Set lookupMorphologicallySimilarLexicalForms(         String lexicalForm)      {         Set forms = new HashSet();            List baseForms = dictionary.getMorphologicalProcessor().            lookupAllBaseForms(POS.NOUN, lexicalForm);         for (Object baseForm : baseForms)         {            forms.add(baseForm.toString());         }            return forms;      }   ...

Notice the calls to retrieve the morphologicalProcessor from the dictionary and to lookupAllBaseForms to identify all possible root forms of the search term.

Throughout this article you have been exposed to several methods and tools for performing lexical functions on free-form user input. I hope that this has demonstrated how easy it is to take advantage of the conceptualization of the English language provided by WordNet and has given you some ideas for implementing these tools in your own applications. A lexical understanding of input is of course only the tip of the iceberg in terms of the types of analysis that can be performed within the broader category of natural language processing, but as you have seen just a few simple techniques can dramatically improve the ability of your applications to interpret user input.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist