
nyone who has started to look into the semantic web has inevitability seen the Semantic Web Layer Cake diagram some
place. This ubiquitous image highlights an overall (and evolving) vision for applying a compatible and related series of
global standards to the problem of machine-processability on the web.
Figure 1 represents the basic form this diagram usually takes.
At the lowest level, the Unicode and Uniform Resource Identifier (URI) specs introduce the ability to encode human
languages into machine-processable character sets and the ability to identify, address, disambiguate, and reference
those documents in Internet-sized global name spaces. At the next level, the XML and namespace specifications provide
the means to create structured, extensible languages and the opportunity to keep separate elements from alternate
naming schemes. The Resource Description Framework (RDF)
(discussed in the DevX articles,
"What Is the Resource Description Framework?" and
"Creating and Managing RDF Vocabularies") provides the ability to express facts about URI-addressable
content (for example, documents, data, services and concepts) as a series of one of more triples. These triples are named relationships between subjects and values. The RDFS and web ontology specifications allow the subjects, predicates, and objects to be classified in ways that make machine-processability a powerful reality. The layers on top of OWL are still being defined and will not be considered further at the moment.
 | |
|
Figure 1. Semantic Web Layer Cake:
This figure shows the basic form the semantic web layer cake usually takes. |
This layered set of specifications allows increasingly rich processing features. For example, RDF provides the ability
to express the relationship:
<http://purl.org/people/johnsmith> <foaf:dateOfBirth> '1970-04-12'^^xsd:date .
Without any other context, however, the only questions a SPARQL query engine could answer are, "Show me John Smith's birthday." or "Show me anyone whose birthday I know." or "Show me anyone who was born on or after 1970." These are all useful pieces of information, but represent the kinds of relational queries database systems have been doing for years. The benefit of RDF is the extensibility of the graph model in terms of shape and the relationships captured. The Open World Assumption allows the knowledge base to be extended with any new piece of data that is discovered:
<http://purl.org/people/johnsmith> <favorites:color> <http://purl.org/color#blue> .
The queries from above can be modified to include this new piece of information: "Show me the favorite color of anyone born between the years of 1960 and 1980." The example uses a completely arbitrary color vocabulary, but it works. Smart humans can make the leap to include this new relationship into increasingly sophisticated queries. Operating at this level of the Layer Cake provides a powerful and flexible data integration strategy powered by human understanding of domains of interest. Software systems are still a little under-represented in this applied human intelligence/graph pattern matching approach, however.