nyone who has started to look into the semantic web has inevitability seen the Semantic Web Layer Cake diagram some
place. This ubiquitous image highlights an overall (and evolving) vision for applying a compatible and related series of
global standards to the problem of machine-processability on the web.
represents the basic form this diagram usually takes.
At the lowest level, the Unicode and Uniform Resource Identifier (URI) specs introduce the ability to encode human
languages into machine-processable character sets and the ability to identify, address, disambiguate, and reference
those documents in Internet-sized global name spaces. At the next level, the XML and namespace specifications provide
the means to create structured, extensible languages and the opportunity to keep separate elements from alternate
naming schemes. The Resource Description Framework (RDF)
(discussed in the DevX articles,
"What Is the Resource Description Framework?" and
"Creating and Managing RDF Vocabularies") provides the ability to express facts about URI-addressable
content (for example, documents, data, services and concepts) as a series of one of more triples. These triples are named relationships between subjects and values. The RDFS and web ontology specifications allow the subjects, predicates, and objects to be classified in ways that make machine-processability a powerful reality. The layers on top of OWL are still being defined and will not be considered further at the moment.
Figure 1. Semantic Web Layer Cake:
This figure shows the basic form the semantic web layer cake usually takes.|
This layered set of specifications allows increasingly rich processing features. For example, RDF provides the ability
to express the relationship:
<http://purl.org/people/johnsmith> <foaf:dateOfBirth> '1970-04-12'^^xsd:date .
Without any other context, however, the only questions a SPARQL query engine could answer are, "Show me John Smith's birthday." or "Show me anyone whose birthday I know." or "Show me anyone who was born on or after 1970." These are all useful pieces of information, but represent the kinds of relational queries database systems have been doing for years. The benefit of RDF is the extensibility of the graph model in terms of shape and the relationships captured. The Open World Assumption allows the knowledge base to be extended with any new piece of data that is discovered:
<http://purl.org/people/johnsmith> <favorites:color> <http://purl.org/color#blue> .
The queries from above can be modified to include this new piece of information: "Show me the favorite color of anyone born between the years of 1960 and 1980." The example uses a completely arbitrary color vocabulary, but it works. Smart humans can make the leap to include this new relationship into increasingly sophisticated queries. Operating at this level of the Layer Cake provides a powerful and flexible data integration strategy powered by human understanding of domains of interest. Software systems are still a little under-represented in this applied human intelligence/graph pattern matching approach, however.
To achieve the full potential for machine-processability, there needs to be additional structure expressible
about the data outside of the context in which it was created. A human could interrogate a knowledge base with the
above types of relationships and make the determination that anyone born before today is alive at the moment.
Software would need such a concept laid out for it. The Web Ontology Language (OWL) provides the ability to define
classes of things based on types of relationships, values, etc. An OWL class could be defined that says "Something is
a member of the AliveThing class if it has a <foaf:dateOfBirth> predicate associated with it and a value of less than today." Now a reasoning engine could interrogate the graph and answer the question of who is known to be alive by satisfying the query 'who is an instance of the AliveThing class?'
The problem with OWL is that it is a more complicated technology than many people can handle, at least initially. New tools and books are emerging to help lower the bar to OWL modeling, but there remains a need to satisfy the requirements for traditional knowledge management schemes and workers. This article introduces the kinds of data that has been produced so far and how the Simple Knowledge Organization System (SKOS) helps support lightweight but still relatively formal concept schemes.
SKOS and Concept Schemes
Librarians and other data stewards have been using a wide variety of well-understood knowledge organizing schemes for years in the form of taxonomies, thesauri, controlled vocabularies, and subject headers. These approaches allow the organization of concepts into concept schemes where it is possible to indicate relationships between terms. A taxonomy introduces the notion of hierarchical relationships. A thesaurus indicates notions of synonomy, antonymy, and the idea of broader or narrower terms. A controlled vocabulary standardizes a concept space around an established set of preferred names for the topics of interest in a domain. By defining these relationships, an information manager can expose user interfaces driven by the kinds of concepts a user is likely to expect. An everyday example of this is the product category navigation menus of just about any e-commerce site. Having a well-understood taxonomy of terms can facilitate improved search; no longer are simple keywords being matched in a document space, but related words can be considered as well. This helps improve the user experience and theoretically yields better results.
These well-understood and established schemes could potentially be mapped into OWL ontologies, but the expertise to do so is in short supply and there are some fairly subtle modeling hobgoblins lurking to undercut any such effort.
SKOS was developed (and is in the process of being adopted by the W3C) to address these issues. It is intended as a means of exposing existing concept schemes to semantic web-enabled systems as well as to be a simplified technology for expressing important, but not overly complicated concept relationships.
A basic example expressed in RDF Turtle notation
might look something like this:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix prod: <http://purl.org/product/> .
@prefix prod2: <http://purl.org/product2/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
prod:computer rdf:type skos:Concept
This establishes a computer as a concept of importance in a notional product namespace. It is the general notion of a
computer that is being established here. In order for real users and real systems to interact with this concept, it
needs a name. SKOS supports the idea of preferred labels (skos:prefLabel
), alternate labels
), and hidden labels (skos:hiddenLabel
prod:computer skos:prefLabel "computer"
The above code labels the concept in a user interface, documentation, etc. Of course, this is a particularly English-centric view of the world. There are many preferred labels for the concept of computer, based on the language context being used. SKOS supports the idea of language-specific tags that you can use to localize the concept labeling:
prod:computer rdf:type skos:Concept;
skos:prefLabel "computador"@es .
Not only does this provide the ability to output a name for a concept, you can imagine reversing the relationship to find keywords in a document and using the concept scheme to identify what the document might be about. Without the explicitly modeled concept terms, it would be difficult to describe to a software system which words were more important than others in a domain.
SKOS provides the ability to express the origin of a concept in a concept scheme:
prod:inventoryTerms rdf:type skos:ConceptScheme
dc:title "Stuff we sell";
dc:creator <http://purl.org/people/johnsmith> .
prod:computer skos:inScheme prod:inventoryTerms;
This metadata about the concept scheme is useful to find where terms come from and to track down who to ask for clarifications, modifications, extensions, etc.
It is possible for a concept to be used in more than one scheme, which is a bit of a break from traditional IT-based organizational systems. They usually represent closed systems, but SKOS allows for concepts to be reused in different schemes:
prod:computer skos:inScheme prod:inventoryTerms, prod2:someOtherTerms;
After an individual concept is defined and labeled, it can be useful to put it into a collection of terms that mean something specific. Whether the collection represents a separate concept or not is up to the domain modeler; it can make sense in either case.
prod:electronics rdf:type skos:Collection;
skos:prefLabel "stuff that needs power";
skos:member prod:digitalcamera .
Finally, you can define hierarchies of concepts by using the skos:narrower
relationships between concepts.
prod:computer skos:narrower prod:applecomputer;
By now, it is clear how SKOS can be used to describe reasonably sophisticated systems of concepts and their relationships in a lightweight and manageable way. These concept schemes then become useful to drive a user interface.