Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Simplifying RDFa Notation

Discover how to use namespace prefixes to define taxonomies and categorization through compact URI encoding (CURIE) to make the semantic web more accessible for web development.

esource Description Framework (RDF) is not the world's most beloved standard. That it is used most often in fairly obscure applications for handling such fundamental concepts as meaning, topicality, and the definition of objects that sound more like philosophy than computer science certainly doesn't help its case. That its use cases generally involve creating both interconnections and abstractions—operations that make most people's eyes glaze over—within external resources (it is the "resource description framework," after all) may have something to do with it too.

Perhaps the biggest problem that RDF faces is those pesky namespaces. The idea is reasonably sound—a term in a vocabulary has meaning only in the context of a given namespace. This idea may seem counterintuitive to the average Joe, but from a computer technology standpoint it makes a lot of sense. A computer has no concept of meaning, unless that meaning is made explicit for it by some formal definition (also known as a binding), with the term in the vocabulary consequently either identifying a given resource or invoking a certain behavior. In a nutshell, creating definitions is what writing programs is all about.

However, in the World Wide Web Consortium (W3C) universe, namespaces are also notable in that they have associated (and more important) unique Uniform Resource Identifiers (URIs). For instance, if you want to describe a vocabulary for electronic business cards, you'd likely end up using something like the vcard specification. The vocabulary namespace is given by http://www.w3.org/2001/vcard-rdf/3.0#, and a term such as a nickname is consequently encoded as http://www.w3.org/2001/vcard-rdf/3.0#NICKNAME.

Clear and unambiguous, right? Maybe. To a computer the syntax is clear, easily parsed into a namespace and its associated term, and it ensures that there isn't ambiguity (which is a big part of the challenge to the semantic web in the first place). However, to a human being http://www.w3.org/2001/vcard-rdf/3.0#nickname is an eye-glazing monstrosity, difficult to read, and even more difficult to type—something that frankly only a writer of mil spec documentation could love.

Anyone who works with such namespaces intrinsically understands the resolution to this issue—using prefixes.

The challenge presented here is how to make it relatively easy to encode contextual metadata into a web document: easy for the content developer, easy for the programmer, and easy for the user of this data. If such a solution can be found, it would simplify processing documents for relevant metadata without the cost of expensive natural language parsers or related tools, and without the overhead of maintaining separate metadata files for every page of a web resource—especially when such resources are themselves constructed from pieces contained in databases or aggregated from XML feeds.

There are shortcuts. You can declare the namespace in one part of an RDF document, and then just refer back to it with the #nickname portion. However, to a web designer especially, this approach treads dangerously close to a different usage of the # operator—specifically the HTML link tag <a href="#nickname">, where a corresponding link target exists: <a name="nickname">.

What's worse, this strategy breaks down whenever you have multiple namespaces in play, and then you're forced into the longer namespace notations to clearly identify the terms in question. While nickname is fairly unambiguous, something like category may be used in any number of different vocabularies, and while each may mean generally the same thing, the attributes that a vcard catalog exposes may be quite different from an Atom category, for example. Nevertheless, having to sling around http://www.w3.org/2001/vcard-rdf/3.0#category and http://www.w3.org/2005/Atom#category all day just doesn't fill one with joy.

Anyone who works with such namespaces intrinsically understands the resolution to this issue—using prefixes. After all, when you create an XSLT style sheet element, you don't generally use the form <http://www.w3.org/1999/XSL/Transform#stylesheet ...>. Instead, you declare a namespace prefix such as xsl, associate it with the namespace, and then use the prefix:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" ...>

This solution is generally not that unambiguous, though it does require (especially in the case of default namespaces) that you need to keep track of the namespace stack for subordinate elements.

Not surprisingly, you're beginning to see the adoption of this notation informally in a lot of places beyond the formal qualified names (or QNames) of XSLT elements and attributes. For instance, using a notation like this makes it abundantly clear that the atom attribute specifically indicates the category term that's part of the atom specification, rather than the vcard one:

<div property="atom:category" xmlns:atom="http://www.w3.org/2005/Atom">

Unfortunately, there is a problem here—protocols. Web protocols such as http:, ftp:, mailto:, and so on all make use of the same colon (:) separator between the protocol and the rest of the URI. Additionally, even non-associated protocols, such as schema:, are often used for creating URIs as well. The syntax similarities make it more difficult for parsers to distinguish between a URI protocol and a namespace prefix, a process that significantly reduces the efficiency of any parsing operation.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date