he most widespread?and likely most reported on?Semantic Web technology is a W3C recommendation called RDF (Resource Description Framework). An XML-based language for representing data in knowledge bases, RDF is used in nearly all existing online knowledge bases. But while the spotlight is on RDF, other technologies such as NLP, SPARQL, ontologies, and inference all work in concert to enable the Semantic Web stack.
To explain how the Semantic Web can be useful in next-generation web development, this article provides a survey of these core Semantic Web technologies and concepts. Along with outlining the technologies, it examines their advantages and disadvantages when compared with traditional web development tools.
Entity Resolution and Access to Limitless Knowledge
Natural Language Processing (NLP) is a technology that translates human-readable language to machine-readable language and vice versa. It is not a pure Semantic Web technology because it can exist outside the Semantic Web. In fact, it can easily be applied to any traditional application. For example, NLP can determine sentiment in a block of text or important topics in a news article. So whether you are working on a product review application that determines the role of positive or negative reviews or an application that works with news articles, NLP by itself can offer a number of benefits.
In the Semantic Web, however, NLP’s value is enhanced. When dealing with text, a Semantic Web application can leverage NLP to perform something called entity resolution, which is a process of connecting important bits in a text block to the many interconnected, freely available knowledge bases on the web. Because so many of these knowledge bases, called ontologies, are represented in the common RDF standard, they can consume and integrate any entity as soon as NLP resolves it. This process provides unprecedented access to practically limitless knowledge.
Yet sifting through these enormous sets of data is a real challenge. How is one really supposed to work with them? What are their intended uses? Here are a few examples of entity resolution in practice:
- Biotech: An entity resolved to denote some compound can be used to get knowledge about that compound and how it works with others compounds.
- News media: An organization can have their topics of interest resolved as entities in varying ontologies to retrieve additional facts about that entity, including its history and how it is related to other items in the news.
- Industry domain agnostic: An application simply takes text as input to better serve and personalize its responses to the user. After resolving entities in the text, the system can fetch and gather more relevant data about those entities.
Ontologies Great and Small: Size Matters
Ontologies are not all created equal. There are a few different types of these knowledge bases, each with its strengths and shortcomings. Ontologies vary from being domain-specific and focused only on the details of a single topic (such as pizza or wine) to being large and all-encompassing. The current business trend is to work with smaller, often custom ontologies that satisfy a business’s particular queries. However, this approach risks overlooking valuable assets in greater ontologies that may be successfully reused.
This section demonstrates the difference in potential intelligence between an application that uses small ontologies and one that uses large ontologies. On the large end, consider the biggest ontology of them all: the Cyc project, a very ambitious attempt to collect all the world’s knowledge in one knowledge base. Cycorp, a company that develops, commercializes, and applies the Cyc technology, claims to have about a quarter of the world’s entire knowledge already organized. For the small ontology, consider one that has to do with pizza and comes packaged with Protégé, a free ontology visualization and editing tool.
In corporate software development, speed and quality are key. Small ontologies that deal only with a domain of interest have a shallower learning curve, require less research, and allow you to get the right data quickly and easily. The ontology also is more manageable because you can browse it easily. With a small, domain-specific ontology, its correct/intended use can be more intuitive as well, and scalability is not a major concern.
On the other hand, the quality of a software product depends on more than just how manageable it is. A forward-thinking development team will build an application based on an architecture that allows that application to grow easily. While using smaller ontologies doesn’t necessarily prevent having forward-looking architecture, one pitfall scenario can go as follows.
Imagine that your application helps customers order pizzas. You can use a small ontology to allow the customer to change toppings quickly, play around with different variations, and finally create the best virtual pizza ever. But does the restaurant have enough ingredients, or do they need more delivered? Will the customer need to pick up the pizza from the restaurant or can it be delivered? Does the oven work? These are issues outside the realm of strictly pizza. They have to do with transportation of ingredients, issues with the restaurant paying its electricity bill, and other order-fulfillment concerns. For these concerns, the application would be better served using a larger ontology like Cyc, which is available in RDF-based OpenCyc.
Reuse is a best practice of development, and Semantic Web development is no different. Having organized 25% of the world’s knowledge, Cyc can be very helpful in that approach?to a point. An ontology should be built with use cases in mind, the ultimate goal being to make retrieving certain data easier.
SPARQL and Inference: Flexibility and Machine Smarts
So lots of data live in RDF ontologies. But why is it any better there than in traditional and more mature relational databases, which developers can work with more easily. You already know that ontologies make it easy to share data, but they offer two more benefits that you may not be aware of:
- More flexible querying with SPARQL
- Inference, a machine-learning process borrowed from artificial intelligence
SPARQL is a W3C-recommended querying language for RDF-based models. It works on matching patterns in the RDF graphs and therefore turns what was a potentially mind-boggling series of awkward joins in a relational database into a single, (mostly) easy-to-write query.
Inference is a process of machine learning where if you have the following facts:
- Fact 1: If A is true, then B is true.
- Fact 2: If B is true, then C is true.
Then the machine figures out (or infers) that if A is true, then C is true.
Without inference, you just have two facts (Fact 1 and Fact 2), but with inference your application can process the logical consequence of those two facts and provide one or more new pieces of data. Many such logical constructs exist today, and when the data is structured in a logical way that lends itself to inference, they can produce new data.
Obviously, the benefits of SPARQL and inference are priceless for business intelligence, as having more information about the customers, products, and the operating space can help an organization deliver a better product. The trick is to architect the data structures and the ontologies for not only easy retrieval, but also for logical inference.
As promising as SPARQL and inference may are, they are both notoriously slow, especially when working with large data sets. So the application designer must have the use case and the application user in mind when choosing to use these tools.
Semantic Web, a Fine Work in Progress
Technologies related to the Semantic Web are very powerful, but they are still in the early stages of adaptation by the business community. Each technology has its shortcomings, and they all carry the overhead of being new and requiring an often steep learning curve. Because the Semantic Web stack is a relatively advanced computer science topic, few developers today have the skills to work with the technologies and so wide adaptation remains a long-term challenge.
All this makes Semantic Web applications more expensive to build and maintain. So when planning to build a Semantic Web application or even an application that simply uses some of its technologies rather than participating in the greater Semantic Web, the application designer must consider a much more sophisticated and forward-thinking set of use cases than when he or she would when designing a traditional application.