few months ago I wrote the first piece in a series of articles looking at the Semantic Web in the business world. Since then my company has launched a product and a service that makes very heavy use of Semantic Web technology. In this article, I will share the lessons learned to this point in both these Semantic ventures.
E-Commerce Powered by Semantic Technology
Our consumer product Milderwilder.com was a new user experience in e-commerce. We initially focused on semantic matching and search, but that was too hasty. To do online commerce, in most cases one needs an inventory. While this is too-obvious of a statement, it is important to focus on inventory because classification and categorization of data is a big piece of the Semantic puzzle; in many cases bigger than the search itself.
Taxonomies and ontologies are great tools for classification within and across various domains of knowledge. An item in a taxonomy has a context given by the arrangement of the set of terms, and is essentially classified. In my original article I emphasized building the taxonomy and the ontology while having the business case in mind. Practice has confirmed it entirely because the context that is to be created by the arrangement of the taxonomic terms must be in-line with the business views and objectives. Lets quickly go over how to make a taxonomy classify an inventory.
Ontology-building Crash Course
Even if you had not started on a taxonomy, following are the steps to make one. If you are new to this, count on making mistakes along the way. But don’t worry, you can continuously curate what you did in version one, and make improvements moving forward.
First, consider your domain vocabulary. That is the terminology used in the scope of your business that you are trying to capture. If you are doing general e-commerce, some terms in the domain vocabulary might be: car, vehicle, motorcycle, scooter, computer electronics, computer, printer, inkjet printer, brand, laptop, desktop, etc.
Once you have a decent sample of terms which you know will be important to your business moving forward, just reshuffle them into a hierarchy. Here is an example with the tiny vocabulary above:
(Tab spacing indicates a sub-type relationship)
Vehicle (type of Thing)
Car (type of Vehicle)
Motorcycle (type of Vehicle)
Scooter (Type of Motorcycle)
ConsumerElectronics (type of Thing)
Computer (type of ConsumerElectronic)
Laptop (type of Computer)
Desktop (type of Computer)
Printer (type of ConsumerElectronic)
InkJet (type of Printer)
OtherPrinters (type of Printer)
And we just went from a corporate vocabulary to a (very simple) taxonomy. The taxonomy gives quite a bit of value. By categorizing items, it gives each one of its items a context, further enabling layers of intelligence on top of the current structure because given a context, further knowledge can be inferred in human cognition as well as software. Those extra bits of intelligence which are essentially relationships between known things to realize new things, turn the taxonomy into an ontology.
There you go -- a five minute crash course in ontology-building!
How to Decipher the Knowledge? NLP
Going back to our e-commerce example, now we have a way to classify inventory. Sometimes there are accurately and properly annotated items, but often they are not. Many times there are items with a description that read like the following: “…these necklace shoes are perfect for a dance club or a formal event…”
It is extremely difficult for software to tell from that sentence whether the item in question is a necklace, a shoe, dance club heels or formal jewelry item. We did encounter exactly this string and no out-of-the-box NLP solution was even close to being adequate in helping us tell what that item was really about.
We had to combine our own, custom NLP techniques with what we called “training the data.” Training the data is a seemingly clunky process where in the taxonomy itself it is specified that “necklace shoes” is a necklace in the shape of a shoes. This data-training process is a bit of an intermediary stop before the quality of the NLP is improved. But it might also suffice and be a cheaper solution to development of complex NLP.
But even with all these technological bells and whistles, do users really want “Semantic” applications?
The Semantic user experience has a big confusion factor. Even in human comprehension, when shopping for women’s shoes, a statement like “I want ballet flats” means just that – it implies exactness of desire. Even in person-to-person interaction, there are no semantics implied in that statement. To satisfy it, all it takes is an ability to follow directions. On the other hand, a statement like “I want something like ballet flats” implies inexactness and opportunity to apply human semantics. So semantic reasoning is not always needed. In fact, poorly timed use of semantics may be quite detrimental to the user experience.
Most users have become accustomed to fifteen years of rigidness and exactness of Internet use. Sometime before the search results are shown, and even better before the search query is entered, it is important to make certain the user understands what to expect or you will be getting strange customer feedback.
At the end of the day, Semantic technology is just a tool that enables extra intelligence. So the trick is to make sure the intelligence is actually smart and offers value.
Not just E-Commerce
After opening the alpha release of our e-commerce site, to our surprise we were approached by a few companies in the restaurant promotion, recruitment, and a few other spaces for consulting to help them build similar solutions for the applications in their industry.
There are a few more lessons learned in this early part of our business that can be shared. One particularly tough challenge is that while our data may be in RDF format, other companies’ data is not; and there is a lot of such data. And that data will likely not be changing any time soon, so scalability of constantly converting unstructured data is a much bigger factor than might be originally expected and should be a consideration.
Another big question faced by “Semantic” companies is whether to participate in the Linked Data movement. There is a rift in industry opinion about this. On one hand, data wants to be free and open. On the other hand, traditional common business sense suggests that companies should protect their intellectual property. Thus far, there are few cases of successful companies that contribute to the Liked Data movement, and many more successful companies who just make use of it.
Thanks for reading and I’d love to hear thoughts on this and ideas for future articles on Twitter @genadinik.