he Semantic Web is a grand vision for increasing the power of the web through better expression and management of context. Semantic Web developers are building a framework to open up and connect organized information, which takes advantage of many popular developments on the web, such as the success of Wikipedia, Creative Commons-licensed publishing on sites like Flickr, and various blogs. A portion of this framework is the Linking Open Data (LOD) community initiative (seeded by the W3C Semantic Web Education and Outreach group). A goal of LOD is to weave together separate collections of open data using deep linking and RDF (Resource Description Framework) representations.
The hallmark of LOD is to make it easy for web developers to create and process compatible data. Utilizing LOD calls for a broad war chest of tools and techniques that cover the diverse expertise of Web developers. One popular tool for processing data on the web is XSLT (Extensible Stylesheet Language Transformations), building on the growth of XML as a data format on the web. XSLT is not a general-purpose programming language—so it is limited in its uses—including LOD processing. However, XSLT is very useful to handle auxiliary roles in such processing that involves transforming XML. This article explores specialized areas for the use of XSLT 1.0 in LOD processing. The focus is on XSLT 1.0 (XSLT 2.0 does offer more for LOD processing, but it is far more complex and much less used by the community). XSLT 1.0 has more processors than 2.0 and the EXSLT set of community extensions, which has strong support in Firefox 3.0, provides facilities that bring it close to the power of XSLT 2.0.
XSLT in the Browser
Mainstream web browsers, such as Internet Explorer and Firefox, are the most obvious places for deploying LOD processing using XSLT. However, due to security reasons, using LOD processing increases your limitations, which are already hampered by using XSLT. Listing 1 is an example of XSLT that tries to process a page from DBPedia, which is an LOD wrapper for Wikipedia and other data sources.Save this code as documenttest.xslt and try loading it into a browser; you will see a security error. For example, using the FireFox 3.0 beta the result is a blank page. And if you check the Error Console (in the Tools menu) you will see an error such as:
Security Error: Content at file:///temp/lod-xslt/documenttest.xslt may not load data from http://dbpedia.org/page/XSL_Transformations.
This error occurs because you cannot use the document() function to access a resource at a different URI scheme or host from the base URI of the XSLT. This same restriction applies to xsl:import and xsl:include, and you cannot get around it using ordinary Javascript. To get the full power of XSLT in Firefox—and in any browser that takes security seriously enough—you need to deploy the XSLT through an extension (this article does not focus on this deployment issue). Luckily there are a couple of handy add-ons you can use to run XSLT with many fewer restrictions.
For the purpose of this article, the author worked with the XSLT engine in Firefox 3.0, which is an excellent place for tools to process LOD because Firefox is known for power and conformance. It is also a pervasive toolkit and you can expect anything developed for Firefox to be ready for a wide variety of users. Firefox 3.0 is still in beta, and should be complete by late 2008. It offers many important improvements for XSLT processing, among other things.
Mining What’s Described
One key LOD practice is to offer multiple representations for a resource and use content negotiation to determine which representation to send upon request. This involves using HTTP headers to tell the server what representations the client prefers. You might, for example, send HTML or XHTML to a plain browser and RDF to a more specialized tool. It is very handy to provide links between these representations so that if content negotiation does not do the trick the users have a way of finding the exact representation they prefer.
If you send (X)HTML, the conventional way of linking to the RDF representation is through a link in the document header. Once again, DBPedia is used for an example, which has the following convention:
- The abstract resource is at http://dbpedia.org/resource/{id} — the server uses content negotiation to determine what representation to send back
- The abstract resource is at http://dbpedia.org/page/{id} — the server always sends back XHTML
- The abstract resource is at http://dbpedia.org/data/{id} — the server always sends back RDF/XML
Listing 2 shows the XHTML head element from http://dbpedia.org/page/XSL_Transformations.The first link connects this representation to the alternative RDF representation. The author developed a bit of XSLT to take advantage of this convention. The code does the following:
- Processes an XHTML file
- Retrieves any RDF alternative representation
- Summarizes the resources described there
- Generates XBEL (XML Bookmark Exchange Language), a simple format for lists of links
Because the RDF is technically another representation of the abstract resource, this XSLT is essentially a tool to summarize what resources are described in an XHTML page (see Listing 3).
There are two technical points to examine in Listing 3, which are marked with comments (comment A and comment B). The following describe the two comments:
A. In general, it is recommended to use the push style of XSLT rather than the pull style, especially if you are not dealing with a very rigid document structure. This terminology means that you should use xsl:apply-templates and modes in many cases where you might be tempted to use xsl:for-each.
B. The code to generate a title from an RDF description does illustrate a significant limitation of XSLT for such work. XSLT is not really RDF-aware and works only at the syntax layer. Look for both rdfs:label and dc:title for wider coverage, even though this is redundant because the latter is defined as a subproperty of the former. Sometimes a vocabulary defines other such subproperties, which would not be picked up in Listing 3. This is not a major problem for this use case since the title is only grabbed as a convenience to the user. Notice how the XSLT lang() function is used to grab only English labels (DBPedia includes labels in many languages).
Listing 4 shows the output from running Listing 3 against http://dbpedia.org/page/XSL_Transformations.
You can see that the script found several described resources, most of which are synonyms for “XSL_Tansformations” pulled in from Wikipedia. Only one title was discovered, because of the problems previously mentioned for interpreting RDF semantics.
Choosing XSLT
As for any web development, use whatever tools you prefer for Linking Open Data (LOD), but there are a few things that make XSLT attractive. For one, XSLT processing is much faster than Javascript/DOM in almost all browsers. Also, some web developers prefer to learn XSLT rather than other more general programming languages. By using Semantic Web technologies now, you strengthen your position as a web developer for the future. Ideally, you should feel empowered to use a combination of languages for processing, and to target each language to its greatest strength.