Existing RDF Vocabularies
Examples of existing vocabularies include the usual suspects: Friend-of-a-Friend (FOAF)
Project, Description of a Project (DOAP)
, Really Simple Syndication (RSS)
, and the ubiquitous Dublin Core
. The FOAF vocabulary is designed to establish a decentralized language for describing personal and professional interests and social network linkage. It may be usurped eventually by the recent activity on Google's OpenSocial
networking efforts, but the benefits over closed proprietary networks and strictly hierarchical modeling languages are clear. DOAP is a vocabulary for describing open source projects. Dublin Core is a vocabulary for expressing publication metadata.
Beyond these canonical examples, new vocabularies are emerging to capture things such as Geotagging information, Creative Commons licensing, life sciences terminology, calendaring data, key Wikipedia facts, the CIA Factbook, temporal relationships, and so on. Some of these vocabularies are consensus based and designed to model a domain; others are made by independent individuals or groups looking to express existing content in a more machine-readable format. In either case, as people consider creating vocabularies like these, they will probably need some guidance. Even for seasoned data modelers these skills are new.
This discussion provides a series of practical recommendations to help you with this process.
This fragment from the Dublin Core vocabulary describes the term creator and will be referenced throughout the rest of the article.
Mint Persistent URIs
<rdfs:comment xml:lang="en-US">An entity primarily responsible for making
<dc:description xml:lang="en-US">Examples of a Creator include a person, an
organization, or a service. Typically, the name of a Creator should be used
to indicate the entity.</dc:description>
Before you start to define the classes, properties, and constraints of your RDF vocabulary, begin with a commitment to use good names, and decide where your vocabulary will be hosted initially. RDF predicates are usually grounded in resolvable contexts. Don't simply throw a vocabulary up without considering the potential lifetime of its use. Systems that reference your vocabulary terms will break if you move or restructure this location.
The reality is that any URL-based system is likely to change eventually. One way to get around this issue is to mint persistent URIs using the open source software infrastructure developed by the Online Computer Library Center (OCLC)”and recently updated by Zepheira. The URI is grounded within a resolvable URL context, but supports user-editable redirection rules that allow the hosted location to change over time without affecting clients of the URIs. Here's an example:
attribute references the persistent URL (PURL) http://purl.org/dc/elements/1.1/creator
, which currently resolves to http://dublincore.org/2006/12/18/dces.rdf#creator
. Notice how the named element represents a logical structure that is unlikely to ever change. The resolved URL references a fragment on an RDF file located somewhere else. That file can move safely to another location as long as the rewrite rule is updated. Any references to the PURL in other RDF statements will remain valid in the face of this kind of a migration, which makes facts expressed on the web that much more resilient and universal.
Use Human-Readable RDFS Elements
The semantic web initiative is about making data on the web more accessible for machine processing. This laudable goal shouldn't exclude humans as consumers of metadata as well, however. The RDF and OWL ontologies may be designed for processing by software, but the terms expressed should be well documented so that people can evaluate the intent of the terms for possible reuse and extension. It may not be at all obvious what a term is supposed to mean in a domain context.
Vocabulary authors must be explicit by using the RDFS constructs <rdfs:label> and <rdfs:comment>. These allow for human meaningful, machine-processable metadata about the terms being discussed.
Notice as well that the prior example specifies an xml:lang attribute to indicate the cultural context under which the rdfs:label applies:
Because we desire the human readability to be accessible both to people reading our vocabulary files directly as well as through the parsed and processed results, we cannot simply use XML comments to indicate intention.