Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Gleaning Information From Embedded Metadata : Page 4

Put GRDDL-enabled agents to the task of extracting valuable information from machine-processable metadata embedded in documents, courtesy of prevailing semantic web standards.

Transform Discovery
The IAspectXDA interface provides the ability to iterate over XML nodes selected from an XPath. It counts how many transforms were discovered, and then creates arrays for asynchronous future handles and results:

count = Integer.valueOf(uriXda.eval("count(/uris/transform)"). getStringValue()).intValue(); handles = new INKFAsyncRequestHandle[count]; results = new IURRepresentation[count]; IXDAReadOnlyIterator transforms=uriXda.readOnlyIterator( "/uris/transform" ); while( transforms.hasNext() ) { transforms.next(); nextTransform = transforms.getText(".", true); // a java.lang.String // Get the canonical URI for the transform. nextTransformURI = sourceURI.resolve(nextTransform); // a java.net.URI // Perform the XSL transformations asynchronously. handles[idx++] = asyncTransform(sourceDoc, nextTransformURI.toString(), null); }

To reduce the amount of time required to harvest the results, the transformation requests do not block until all the requests have been issued and the results are ready to be captured:

// join on the results for(int i = 0; i < idx; i++) { results[i] = handles[i].join(); }

After all the results are available, they are accumulated into the resource created previously from the RDF template:

for(int i = 0; i < idx; i++) { // Append the results to the RDF result set rdf = syncTransform(rdf, "ffcpl:/smushrdf.xsl", results[i]); }

Here is the style sheet that performs this accumulation:

<xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <xsl:output method="xml"/> <xsl:param name="param"/> <xsl:template match="/" > <xsl:for-each select="*"> <xsl:copy> <xsl:copy-of select="/*/*" /> <xsl:copy-of select="$param/rdf:RDF/*" /> </xsl:copy> </xsl:for-each> </xsl:template> </xsl:stylesheet>

Finally, the accumulated results are run through the xmltidy accessor (more information on accessors is available here) and then tagged with an appropriate MIME type:

// clean up the XML req=context.createSubRequest("active:xmltidy"); req.addArgument("operand", rdf); rdf=context.issueSubRequest(req); // create response response=context.createResponseFrom(rdf); response.setMimeType("application/rdf+xml"); context.setResponse(response);

The harvest.bsh script does not do anything useful with the results, but you could store them in an RDF triple store such as Mulgara, which was discussed in the article, "Storing and Using RDF in Mulgara" (DevX, August 30, 2007).

While some of the NetKernel concepts in the example may seem a little strange, hopefully it is obvious that passively harvesting RDF metadata with GRDDL profiles is not a difficult task. (Go ahead and dig deeper into the NetKernel's powerful resource-oriented environment! View the documentation page at http://localhost:1060/ep+name@app_fulcrum_backend_documents after you have started NetKernel.)

There is still a bit of a bootstrap problem to get GRDDL used more extensively around the web. After people see how useful and easy this process can be, it seems like it will only be a matter of time until the tools will enable you to take advantage of metadata embedded in browsed documents. With users already willing to provide quality metadata through tags on social-oriented web sites, if the bar is lowered on how to discover and reuse terms from formal vocabularies, it seems likely that people will do so for semantic markup just like they do for presentation markup.

Additional Related Resources

Brian Sletten is a liberal arts-educated software engineer with a focus on forward-leaning technologies. He has worked as a system architect, a developer, a mentor and a trainer. He has spoken at conferences around the world and writes about web-oriented technologies for several online publications. His experience has spanned the defense, financial and commercial domains. He has designed and built network matrix switch control systems, online games, 3D simulation/visualization environments, Internet distributed computing platforms, P2P and Semantic Web-based systems. He has a B.S. in Computer Science from the College of William and Mary and currently lives in Fairfax, VA. He is the President of Bosatsu Consulting, Inc., a professional services company focused on web architecture, resource-oriented computing, the Semantic Web, advanced user interfaces, scalable systems, security consulting and other technologies of the late 20th and early 21st Centuries.
Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.