devxlogo

Applying SKOS Concept Schemes

Applying SKOS Concept Schemes

nyone who has started to look into the semantic web has inevitability seen the Semantic Web Layer Cake diagram some place. This ubiquitous image highlights an overall (and evolving) vision for applying a compatible and related series of global standards to the problem of machine-processability on the web. Figure 1 represents the basic form this diagram usually takes. At the lowest level, the Unicode and Uniform Resource Identifier (URI) specs introduce the ability to encode human languages into machine-processable character sets and the ability to identify, address, disambiguate, and reference those documents in Internet-sized global name spaces. At the next level, the XML and namespace specifications provide the means to create structured, extensible languages and the opportunity to keep separate elements from alternate naming schemes. The Resource Description Framework (RDF) (discussed in the DevX articles, “What Is the Resource Description Framework?” and “Creating and Managing RDF Vocabularies”) provides the ability to express facts about URI-addressable content (for example, documents, data, services and concepts) as a series of one of more triples. These triples are named relationships between subjects and values. The RDFS and web ontology specifications allow the subjects, predicates, and objects to be classified in ways that make machine-processability a powerful reality. The layers on top of OWL are still being defined and will not be considered further at the moment.

 
Figure 1. Semantic Web Layer Cake: This figure shows the basic form the semantic web layer cake usually takes.

This layered set of specifications allows increasingly rich processing features. For example, RDF provides the ability to express the relationship:

  '1970-04-12'^^xsd:date . 

Without any other context, however, the only questions a SPARQL query engine could answer are, “Show me John Smith’s birthday.” or “Show me anyone whose birthday I know.” or “Show me anyone who was born on or after 1970.” These are all useful pieces of information, but represent the kinds of relational queries database systems have been doing for years. The benefit of RDF is the extensibility of the graph model in terms of shape and the relationships captured. The Open World Assumption allows the knowledge base to be extended with any new piece of data that is discovered:

   .

The queries from above can be modified to include this new piece of information: “Show me the favorite color of anyone born between the years of 1960 and 1980.” The example uses a completely arbitrary color vocabulary, but it works. Smart humans can make the leap to include this new relationship into increasingly sophisticated queries. Operating at this level of the Layer Cake provides a powerful and flexible data integration strategy powered by human understanding of domains of interest. Software systems are still a little under-represented in this applied human intelligence/graph pattern matching approach, however.

To achieve the full potential for machine-processability, there needs to be additional structure expressible about the data outside of the context in which it was created. A human could interrogate a knowledge base with the above types of relationships and make the determination that anyone born before today is alive at the moment. Software would need such a concept laid out for it. The Web Ontology Language (OWL) provides the ability to define classes of things based on types of relationships, values, etc. An OWL class could be defined that says “Something is a member of the AliveThing class if it has a predicate associated with it and a value of less than today.” Now a reasoning engine could interrogate the graph and answer the question of who is known to be alive by satisfying the query ‘who is an instance of the AliveThing class?’

The problem with OWL is that it is a more complicated technology than many people can handle, at least initially. New tools and books are emerging to help lower the bar to OWL modeling, but there remains a need to satisfy the requirements for traditional knowledge management schemes and workers. This article introduces the kinds of data that has been produced so far and how the Simple Knowledge Organization System (SKOS) helps support lightweight but still relatively formal concept schemes.

SKOS and Concept Schemes
Librarians and other data stewards have been using a wide variety of well-understood knowledge organizing schemes for years in the form of taxonomies, thesauri, controlled vocabularies, and subject headers. These approaches allow the organization of concepts into concept schemes where it is possible to indicate relationships between terms. A taxonomy introduces the notion of hierarchical relationships. A thesaurus indicates notions of synonomy, antonymy, and the idea of broader or narrower terms. A controlled vocabulary standardizes a concept space around an established set of preferred names for the topics of interest in a domain. By defining these relationships, an information manager can expose user interfaces driven by the kinds of concepts a user is likely to expect. An everyday example of this is the product category navigation menus of just about any e-commerce site. Having a well-understood taxonomy of terms can facilitate improved search; no longer are simple keywords being matched in a document space, but related words can be considered as well. This helps improve the user experience and theoretically yields better results.

These well-understood and established schemes could potentially be mapped into OWL ontologies, but the expertise to do so is in short supply and there are some fairly subtle modeling hobgoblins lurking to undercut any such effort.

SKOS was developed (and is in the process of being adopted by the W3C) to address these issues. It is intended as a means of exposing existing concept schemes to semantic web-enabled systems as well as to be a simplified technology for expressing important, but not overly complicated concept relationships.

A basic example expressed in RDF Turtle notation might look something like this:

@prefix rdf:  .@prefix prod:  .@prefix prod2:  .@prefix skos:  .@prefix dc:  .prod:computer rdf:type skos:Concept

This establishes a computer as a concept of importance in a notional product namespace. It is the general notion of a computer that is being established here. In order for real users and real systems to interact with this concept, it needs a name. SKOS supports the idea of preferred labels (skos:prefLabel), alternate labels (skos:altLabel), and hidden labels (skos:hiddenLabel).

prod:computer skos:prefLabel "computer"

The above code labels the concept in a user interface, documentation, etc. Of course, this is a particularly English-centric view of the world. There are many preferred labels for the concept of computer, based on the language context being used. SKOS supports the idea of language-specific tags that you can use to localize the concept labeling:

prod:computer rdf:type skos:Concept;  skos:prefLabel "computer"@en;  skos:prefLabel "ordinateur"@fr;  skos:prefLabel "computador"@es .

Not only does this provide the ability to output a name for a concept, you can imagine reversing the relationship to find keywords in a document and using the concept scheme to identify what the document might be about. Without the explicitly modeled concept terms, it would be difficult to describe to a software system which words were more important than others in a domain.

SKOS provides the ability to express the origin of a concept in a concept scheme:

prod:inventoryTerms rdf:type skos:ConceptScheme  dc:title "Stuff we sell";  dc:creator  .  prod:computer skos:inScheme prod:inventoryTerms;  

This metadata about the concept scheme is useful to find where terms come from and to track down who to ask for clarifications, modifications, extensions, etc.It is possible for a concept to be used in more than one scheme, which is a bit of a break from traditional IT-based organizational systems. They usually represent closed systems, but SKOS allows for concepts to be reused in different schemes:

prod:computer skos:inScheme prod:inventoryTerms, prod2:someOtherTerms; 

After an individual concept is defined and labeled, it can be useful to put it into a collection of terms that mean something specific. Whether the collection represents a separate concept or not is up to the domain modeler; it can make sense in either case. For example:

prod:electronics rdf:type skos:Collection;  skos:prefLabel "stuff that needs power";  skos:member prod:computer;  skos:member prod:musicplayer;  skos:member prod:digitalcamera .  

Finally, you can define hierarchies of concepts by using the skos:narrower and skos:broader relationships between concepts.

prod:computer skos:narrower prod:applecomputer;  skos:narrower prod:dellcomputer;  skos:narrower prod:hpcomputer;  

By now, it is clear how SKOS can be used to describe reasonably sophisticated systems of concepts and their relationships in a lightweight and manageable way. These concept schemes then become useful to drive a user interface.

Lightweight Blog Category Schemes
One place where you can imagine this being useful is in the categorization of blogs and blog entries. It would be great to categorize your own blogs and entries topically in a lightweight, but formal way. People tend to use simple tags at the moment, but tags alone are insufficient. As easy as the folksonomic approach is, it is too hard to link across blogs and reuse the topics when they are unbound. SKOS is the technology to use for this task.

Publishers typically indicate lists of bloggers they like to read on their sites. These implicit recommendations are usually organized only based on blogger identity, not what they tend to blog about. It is unclear whether the author is a colleague or a child until you go and investigate. It would be great to apply your own categories to other people’s blogs to give greater visibility into what they tend to write on and why you find them interesting.

Some notable bloggers have already started to do this kind of SKOS-based categorization. Norm Walsh has started to create categories for his blog using SKOS. He starts with the concept of a topic (and uses the RDF/XML format):

Editor’s Note: Users of Firefox 3.0 can use http://www.norman.walsh.name/knows/taxonomy to see Norm’s categories. Users of Internet Explorer and Safari can use a text editor to view the content.
    A specialization of rdfs:Class used for topics. 

All of the concrete topics are considered instances of this class. Sub-categories are narrowed:

        Celebrations 

Topics that seem particularly interesting to other people are converted into RSS feeds and indicated in the topic definition. For example, because he wrote the book on DocBook, people might be interested in that topic as a feed:

    docbook  docbook  docbook  DocBook 

Finally, all the major topics are wrapped up into a concept scheme called “Everything:”

    Topic taxonomy for norman.walsh.name   

Where “Everything” is a topic in and of itself that has several narrower concepts directly below it:

      everything  The tree of topics is rooted here. Any topic not reachable by following (transitively) the     skos:narrower properties of this topic will not appear in the topic navigation hierarchy.              ...

Norm’s category hierarchy could certainly be used as blueprint for your own SKOS categories. After you create your category concept scheme, it would be fairly trivial to use a technology such as XSLT to convert the concepts into a menu for your own blog entries. Creating a WordPress plugin allows you to apply the categories to other people’s blogs as well. This categorical blog roll is more interesting to your blog readers than just a list of names.

While there are certainly efficiencies to people sharing vocabularies and concept schemes, the semantic web does not make this a requirement. Individual bloggers could define their own category concept schemes and then link them together, extend other people’s concepts, and merge the terms with OWL and other technologies. You are encouraged to investigate this project and the tools, procedures, and plugins that are being developed to facilitate greater interoperability in the blog space using tools like SKOS.

The Semantically-Interlinked Online Communities (SIOC) project has a larger view of these ideas. They are attempting to link topics, conversations and authors across blogs, Usenet postings, etc. They are mixing concepts from the Friend-of-a-Friend (FOAF) vocabulary with SIOC vocabularies and SKOS concepts. This kind of cross-community, cross-blog linkage highlights the power and utility of these data models. Moving away from simple tags to lightweight but more formal knowledge organization systems like SKOS helps you get ready to be a full participant in the social data web of tomorrow.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist