devxlogo

Simplifying RDFa Notation

Simplifying RDFa Notation

Resource Description Framework (RDF) is not the world’s most beloved standard. That it is used most often in fairly obscure applications for handling such fundamental concepts as meaning, topicality, and the definition of objects that sound more like philosophy than computer science certainly doesn’t help its case. That its use cases generally involve creating both interconnections and abstractions?operations that make most people’s eyes glaze over?within external resources (it is the “resource description framework,” after all) may have something to do with it too.

Perhaps the biggest problem that RDF faces is those pesky namespaces. The idea is reasonably sound?a term in a vocabulary has meaning only in the context of a given namespace. This idea may seem counterintuitive to the average Joe, but from a computer technology standpoint it makes a lot of sense. A computer has no concept of meaning, unless that meaning is made explicit for it by some formal definition (also known as a binding), with the term in the vocabulary consequently either identifying a given resource or invoking a certain behavior. In a nutshell, creating definitions is what writing programs is all about.

However, in the World Wide Web Consortium (W3C) universe, namespaces are also notable in that they have associated (and more important) unique Uniform Resource Identifiers (URIs). For instance, if you want to describe a vocabulary for electronic business cards, you’d likely end up using something like the vcard specification. The vocabulary namespace is given by http://www.w3.org/2001/vcard-rdf/3.0#, and a term such as a nickname is consequently encoded as http://www.w3.org/2001/vcard-rdf/3.0#NICKNAME.

Clear and unambiguous, right? Maybe. To a computer the syntax is clear, easily parsed into a namespace and its associated term, and it ensures that there isn’t ambiguity (which is a big part of the challenge to the semantic web in the first place). However, to a human being http://www.w3.org/2001/vcard-rdf/3.0#nickname is an eye-glazing monstrosity, difficult to read, and even more difficult to type?something that frankly only a writer of mil spec documentation could love.

Anyone who works with such namespaces intrinsically understands the resolution to this issue?using prefixes.

The challenge presented here is how to make it relatively easy to encode contextual metadata into a web document: easy for the content developer, easy for the programmer, and easy for the user of this data. If such a solution can be found, it would simplify processing documents for relevant metadata without the cost of expensive natural language parsers or related tools, and without the overhead of maintaining separate metadata files for every page of a web resource?especially when such resources are themselves constructed from pieces contained in databases or aggregated from XML feeds.

There are shortcuts. You can declare the namespace in one part of an RDF document, and then just refer back to it with the #nickname portion. However, to a web designer especially, this approach treads dangerously close to a different usage of the # operator?specifically the HTML link tag , where a corresponding link target exists: .

What’s worse, this strategy breaks down whenever you have multiple namespaces in play, and then you’re forced into the longer namespace notations to clearly identify the terms in question. While nickname is fairly unambiguous, something like category may be used in any number of different vocabularies, and while each may mean generally the same thing, the attributes that a vcard catalog exposes may be quite different from an Atom category, for example. Nevertheless, having to sling around http://www.w3.org/2001/vcard-rdf/3.0#category and http://www.w3.org/2005/Atom#category all day just doesn’t fill one with joy.

Anyone who works with such namespaces intrinsically understands the resolution to this issue?using prefixes. After all, when you create an XSLT style sheet element, you don’t generally use the form <http://www.w3.org/1999/XSL/Transform#stylesheet …>. Instead, you declare a namespace prefix such as xsl, associate it with the namespace, and then use the prefix:

This solution is generally not that unambiguous, though it does require (especially in the case of default namespaces) that you need to keep track of the namespace stack for subordinate elements.

Not surprisingly, you’re beginning to see the adoption of this notation informally in a lot of places beyond the formal qualified names (or QNames) of XSLT elements and attributes. For instance, using a notation like this makes it abundantly clear that the atom attribute specifically indicates the category term that’s part of the atom specification, rather than the vcard one:

Unfortunately, there is a problem here?protocols. Web protocols such as http:, ftp:, mailto:, and so on all make use of the same colon (:) separator between the protocol and the rest of the URI. Additionally, even non-associated protocols, such as schema:, are often used for creating URIs as well. The syntax similarities make it more difficult for parsers to distinguish between a URI protocol and a namespace prefix, a process that significantly reduces the efficiency of any parsing operation.

Rad CURIEs
Recently, Mark Birbeck of x-port.net and Shane McCarron of Applied Testing and Technology edited a number of articles for the W3C, including a primer on attributes for RDF (RDFa). However, they also produced a document entitled, “CURIE Syntax 1.0: A syntax for expressing Compact URIs,” (W3C Working Draft 7, March 2007). A CURIE is, as the name implies, a Compact URI Encoding, rather than a unit of radiation emission, a liturgical Mass (Kyrie Eleison), or the 1985 song of the same name by Mr. Mister.

While formally endorsing the concept as “canonical,” Birbeck and McCarron also recommend one change; namely, to disambiguate CURIEs from URIs or other protocols by enclosing such terms within square brackets ([ ]). For instance, if you have a reference to an atom category as a CURIE, use this notation:

Canonical Notations

 

Now, the namespace declarations in these examples would seem to belie the beneficial nature of such a notation; however, with CURIEs you can use the same type of namespace scoping that you can use with QNames for elements and attributes. You define the namespace at a higher scope (such as the root node of the document), with its appropriate prefix definition and with the understanding that any term that falls within the containment (the scope) of that namespace, which uses the CURIE prefix, falls into that particular namespace.

This point seems fairly subtle, but it’s a fairly radical shift in the way that we think of web content. By making it possible for content, as well as metastructure, to exist within a given namespace, you can effectively interleave semantic content even into documents that don’t normally recognize the notion of namespaces (such as HTML). You also open up the possibility of post processing and content substitution, both of which will be addressed in greater detail shortly.

Note that CURIEs (albeit without the square bracket notation) already are in use in certain areas. For instance, consider the case of XML Schema, where it is possible to define the type of given element or attribute as being part of the xs: schema-instance simple types, such as xs:string, xs:ID, xs:double, and so forth. Technically speaking, these are in fact CURIEs; they are a type of content that is contained within an attribute (usually) that represents a taxonomic term, and that consequently derives from a finite, explicitly defined set of such terms. While it is unlikely that XSD processors would be rewritten with such CURIE notation in mind, it’s not out of the question to envision a schema declaration that looks something like this:

 

This idea opens up interesting possibilities. For example, I’ve been working recently on a mechanism for converting schemas into XForms. One of the central challenges in setting up this conversion is the realization that while XSD Schema has the concept of enumeration, what it lacks is the ability to specify the notion that the value of a given element or attribute can be taken from a taxonomy. (It defines the xs:NMTOKEN type, which in theory should specify a value from such a taxonomy, but it can’t extend from that to say that this particular term must derive from a given taxonomy.)

Related to this idea is the notion that the potential values of an element or attribute might derive from a dynamically generated list of such values (which can be thought of as being yet a different taxonomy, this time consisting of instances rather than categories). Put another way, a CURIE is a term in a constrained taxonomy. It may have some associated meaning, which manifests as replacement text, programmatic bindings, or the like, but ultimately it should be seen as a way of embedding formal taxonomies into traditional web documents. It is in this role that CURIEs will likely play a big part through the introduction of RDFa.

Dispersing RDF
Microformats are trouble. A microformat (which is discussed in “Discover Microformats for Embedding Semantics“) makes use of HTML attributes (or XHTML attributes) as a shorthand replacement for some formal semantic. There’s nothing at all wrong with this approach. The problem is that most microformats as they exist right now assume that all data fit more or less neatly into one of perhaps a dozen distinct formats, and those formats can work with the assumption of a fairly ad hoc approach to encoding them.

So long as the information being so encoded is a vcard, a calendar event, a friend of a friend, or a few other similar syntaxes, this approach isn’t necessarily a bad thing, particularly if it contains the information you need and if your assumption is that you’re coding for usually one or two web services (such as del.icio.us or digg).

Suppose that you are a journalist and you want to encode specific metadata information into your stories. One approach is to use a formal XML document. However, while such XML documents are useful for providing strictly structure information, a writer wants to write and in general would prefer just to annotate his or her writings with some kind of associational editor. If that’s the case, vcard is likely not going to cut it for encoding all of the information the journalist needs while still allowing him or her to write the narrative.

“Hmmm…,” you may be thinking, “what if you were to use…CURIEs?” Yes!

But not so fast. While microformats in general have some problems, one of the things that they recognize is the concept that you do need some kind of framework for putting this structure in place, some specific set of attributes that are being used to solve the problems of encoding.

RDFa defines a number of tags that can be added to the HTML or XHTML model, and that together have the effect of defining within such a document all the same data as would be contained in an RDF document. RDFa is, in essence, RDF for microformats, but it includes enough underlying structure that you can parse through the contents and reconstruct RDF from it.

For instance, suppose that you have a technical article about CURIEs, and you want to embed some metadata about it, such as publishing information, from the Dublin Core Metadata Initiative. Dublin Core is a particularly useful schema for web pages. Its initial role was to provide some form of hint about web documents that wasn’t necessarily contained in the formal HTML; however, the idea of creating secondary RDF files containing Dublin Core information has never really caught on. However, it’s not a radical jump to go from a basic markup page to something like this:

There are several points to note in this particular case. Notice the declaration of the Dublin Core namespace in the header:

The use of namespaces in XHTML documents beyond the XHTML namespace itself points to one of the major benefits of XHTML over traditional HTML: you can introduce specific namespace content into the XHTML, extending it to either add metadata or provide hooks for other processes (such as the graphical SVG or XForms). The preceding example declares the Dublin Core namespace, but note that no element or attribute name in the code uses any dc-defined identifier. Instead, it uses the dc prefix to identify terms within property attributes as belonging to Dublin Core. In other words, these are CURIEs.

Jumping past the

The expression * [property=”dc:subtitle”], for instance, indicates that a match should be made for any element that has a property attribute with the value “dc:subtitle”. The output is illustrated in Figure 1.

Note that you can use advanced CSS 2 and CSS 3 functionality to do a quick scan of RDFa properties. For instance, if you replace the previous style block with this one:

body {color:lightGray;}*[property ^= "dc:"] {color:black;font-size:12pt;}*[property ^= "dc:"]:before {content:"[" attr(property) " - ";color:blue;}*[property ^= "dc:"]:after {content:"]";color:blue;}            

the CSS engine will turn everything light gray except elements that have a property attribute beginning with dc: (see Figure 2); those elements expand to display the attribute values, which display in blue. The predicate [property ^= “dc:”] is a CSS shorthand for finding all properties that start with the dc: prefix. Note that if you change the namespace prefix, you must also change the CSS style selector, as it doesn’t reference the XML namespace at all.

Figure 1. Using the property Attribute in CSS: A match is made for any element that has a property attribute for the associated value.Figure 2. RDFa Properties in a CURIE Namespace: The CSS engine turns the body text light gray in contrast to elements having property attributes that hold a Dublin Core value.

Of course, there may be times where the focus you want is not on the whole document, but rather on just a small section of it. For instance, suppose that you have a web site that contains a listing for an upcoming meeting (the CURIE Society annual picnic) as part of a larger article:

Also announced today was the CURIE Society’s third-annual picnic that will be held at Marie Curie Park in downtown Victoria on Sunday, the 29th of September, from 1 to 5 pm. Guest speakers will include Max Planck and Niels Bohr talking about uncertainty in taxonomic divisions and modeling atomic predicates, respectively.

There’s a lot of information here, but obviously it will take a little bit of creative tagging to put it into a form that’s accessible by a computer. To do so, the first thing that’s necessary is to identify the context of this paragraph as being of a certain basic type or class. The most logical choice for this class is an event. While there are a number of event taxonomies out there, to illustrate that such taxonomies do not necessarily have to be established ones, here’s a unique one:

xmlms:event="http://www.metaphoricalweb.org/xmlns/event"

The first pass to this event class would then be (assuming the taxonomy has already been defined elsewhere) to indicate that the paragraph is an event and to give that event a name:

 ... 

Also announced today was the CURIE Society’s third-annual picnic that will be held at Marie Curie Park in downtown Victoria on Sunday, the 29th of September from 1 to 5 pm. Guest speakers will include Max Planck and Niels Bohr talking about uncertainty in taxonomic divisions and modeling atomic predicates, respectively.

The class attribute defines the paragraph as also being the “holder” or container of an event, with the event itself given a formal name through the id attribute. Note that the use of this class attribute more closely approaches Tim Berners-Lee’s original intent for the tag: it was meant as a way for specifying which class a given entity belonged to, though this usage was later swamped when CSS began using it as a way to provide a name for a style rule.

The double use of an id and the new about tag may seem a bit redundant, until you recognize that the about attribute is a pointer that indicates in the document what the RDFa properties are referring to. This summary metadata will likely be contained within the block that has the corresponding id, but there is no guarantee, and conceivably such RDFa properties could also point to external documents altogether (and contain fully realized URLs, rather than just hashed indexes).

After defining it, you can apply properties to the particular event class:

...   

Also announced today was the CURIE Society’s third-annual picnic that will be held at Marie Curie Park in downtown Victoria on Sunday, the 29th of September from 1 to 5 pm. Guest speakers will include Max Planck and Niels Bohr talking about uncertainty in taxonomic divisions and modeling atomic predicates, respectively.

This class produces a surprising number of useful RDF triples:

<#annual_picnic> ev:title "CURIE Society 3rd Annual Picnic"^^XMLLiteral . <#annual_picnic> ev:venue "Marie Curie Park"^^XMLLiteral . <#annual_picnic> ev:street "3129 N. Douglas St."^^XMLLiteral . <#annual_picnic> ev:city "Victoria"^^XMLLiteral . <#annual_picnic> ev:province "British Columbia"^^XMLLiteral . <#annual_picnic> ev:date "2007-09-29T13:00:00 - 2007-09-29T17:00:00"  ^^xs:dateRange . <#annual_picnic> ev:speaker "Max Planck"^^XMLLiteral . <#annual_picnic> ev:speaker "Neils Bohr"^^XMLLiteral . <#annual_picnic> ev:topic "uncertainty in taxonomic divisions"^^XMLLiteral . <#annual_picnic> ev:topic "modeling atomic predicates"^^XMLLiteral . 

Note that in a few cases, such as with ev:street or ev:province, the metadata was embedded into the description despite not having any text children; this embedding is knowledge that comes from the editor’s notes, not from the text itself (and as such it really is metadata in the purest sense). The overriding theme here is that the RDFa should annotate its host XHTML, but shouldn’t have a direct impact on the presentation of that content. (CSS referencing is an indirect impact but doesn’t change the core data?if the CSS doesn’t mention the RDFa code, it should not be a factor in the layout.)

RDFa works on the assumption that any metadata properties are about the current document, and the about attribute doesn’t need to be stated explicitly. However, as the prior event class example shows, it is possible to narrow the scope to one particular section. In this case, the about attribute must be included, and it should point to an element with an associated id attribute. Therefore, for an element where id=”annual_picnic” the corresponding about attribute would be “http://www.myserver.com/ns/events#annual_picnic”, where “http://www.myserver.com/ns/events” is the URL to document itself.

In general, when you have an attribute reference to some point within the document, you can drop the full URL and just include the hash-marked anchor “#annual_picnic” to refer to the same thing. (Of course, you can also define a null namespace in the document?xmlns:doc=”#”?and then use CURIEs: about=”doc:annual_picnic”.)

The ROI for RDFa
The examples given here represent a fairly limited subset of the full RDFa specification. It should be possible, after RDFa becomes a formal recommendation, to build nearly any RDF structure using an RDFa notation. However, except for extremely specialized applications, just the ability to provide about groups should prove useful to most people. The question, of course, is whether the advantages of an RDFa notation outweigh the real impacts of encoding RDFa into documents in the first place. In other words, is all this work really worth it?

If the only advantage to this approach is to provide a quick and dirty way of generating full RDF documents from RDF-lite (RDFa) documents, it’s likely not. There are certainly applications out there that utilize RDF, but so far their interaction with the web is fairly limited, and their utility, after the initial novelty of seeing relational data interacting with other relational data wears off, is at best somewhat dubious. I’m not necessarily interested in getting 3D spatial linked views of my document in the data model (it is cool to see it, but for the most part it is a little too abstract to have any real relevance to a programmer).

However, there are other users of metadata. One of the more significant is news feeds (especially Atom). Suppose, for instance, that you could point your RSS newsreader to a document with a request for its atom contents. An RDFa-filled paragraph, as shown previously, could very easily be rectified into an Atom feed that looks like this:

  RDFa For "CURIE Society Notes  Kurt Cagle    2007-09-09T09:02:12  2007-09-09T09:02:12  scheme:d428e711-1a99-4212-bd4f-abdf1a2edde5      CURIE Society 3rd Annual Picnic    Kurt Cagle    2007-09-09T09:02:12    2007-09-09T09:02:12            Marie Curie Park    3129 N. Douglas St.    Victoria    British Columbia    2007-09-29T13:00:00 - 2007-09-29T17:00:00    Max Planck    Neils Bohr    uncertainty in taxonomic division    modelling atomic predicates    Also announced today was the CURIE Society       third-annual picnic that will be held at Marie Curie Park in downtown       Victoria on Sunday, the 29th of September from 1 to 5       pm. Guest speakers      will include Max Planck and Niels Bohr talking about uncertainty in       taxonomic divisions and modeling atomic predicates, respectively.      

A normal news-feed reader would be able to look at this metadata and at a minimum display the entry, using the RDFa properties for populating critical information. However, specialized applications that make use of the ev: properties are able to do far more, such as populating calendars with the event in question, by looking at the additional information contained, RDF-like, as additional properties attached to the Atom core schema. With a slightly different namespace, this approach is exactly the same one that Google uses for its Calendar service (among other uses).

RDFa and Atom make for a surprisingly potent combination. One provides a useful way for annotating XHTML content with metadata easily and unobtrusively. The other provides a way of transporting both the metadata and its corresponding links such that generalized feed processors can display at least a minimal set of information about the given resource, and specialized feed processors can take the same Atom feed and use the object properties to generate considerably more sophisticated effects.

Look for an upcoming article that will delve into the formal mechanics of building Atom feeds from RDFa and illustrating how you can extend certain CMS systems to better incorporate RDFa as a core capability.

The CURIE model?using namespace prefixes to define taxonomies and categorization?opens up some interesting doors in the W3C model and provides a particularly intriguing way to make the semantic web more accessible for web developers. Folksonomies have generally proven quite successful, and they open up the possibility that the slightly more fine-grained tweaking that RDFa exposes may have a similar adoption cycle and prove to be as useful (and hopefully more so) in the long run.

 

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist