Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Taking Data Validation to a Dynamic Level : Page 4

A declarative schema works well with XML data streams in a distributed context, where validation should be more functional and use a processing model that evolves beyond a single, static document for XML content.

Deceptively Simple Schematron
However, a far simpler (if somewhat less secure) system is to simply give your validation mechanism the intelligence to talk to resources outside of the schema file itself. In a nutshell, this approach is the one that ISO Schematron takes. The idea behind Schematron is deceptively simple. A Schematron document consists of a collection of rules (rendered in XML), each of which operates a specific context consisting of an XPath expression. Within the rule is a set of assertions, which are themselves XPath expressions and predicates, that test given conditions about the context. If the condition succeeds, nothing happens; but if it fails, then Schematron returns a message in a specific text or XHTML format. While it is possible to use custom parsers, the most typical Schematron process runs this way:
  1. Author the Schematron document.
  2. Transform the Schematron using a special Schematron XSLT, which in turn generates another XSLT (the Schematron filter).
  3. Transform the file to be validated against the Schematron filter to produce a report.
  4. Pass the report onto the user or another process.
You can use this Schematron approach with either XSLT 1.0 and 2.0 processors (though in general I'd recommend the 2.0 approach simply because the capabilities of the language are much more sophisticated). Both expose one important function: the XSLT document() function (which isn't a part of XPath 1.0, per se).

The document() function takes two arguments. The first argument consists of either a URL string or a node-set (or sequence in 2.0) of URLs, while the second argument consists of a document context (which usually takes the current node reference as an argument value). The result is in turn one or more documents from those URLs. Note that if those URLs are themselves parametric GET-based web services, then you can use them to retrieve content from an external service to validate content from dynamic taxonomies.

For instance, suppose that you had a web service that takes a single parameter—colorkey—and returned from that a single XML node of the form:

<color name="rd" label="Red" status="200" statusMessage="Color is valid."/>

Suppose also the color corresponding to the key was found:

<color name="rd" label="Carmine" status="500" statusMessage="Color rd was found, but has been retired."/>


<color name="rd" label="(unknown)" status="400" statusMessage="Color 'rd' was not found."/>

You can then make a Schematron that can read the existing resource, query against the server, and generate the appropriate error message when the assertion is disproved:

<schema xmlns=http://purl.oclc.org/dsdl/schematron> <pattern id="confirmTaxonomies"> <rule context="colorkey"> <let name="$keyValue" value="."/> <let name="colorDoc" value="document(concat('colors.xq?colorkey=',$keyValue),.)"/> <assert test="$colorDoc[@status=200]"> <value-of select="$colorDoc[@statusMessage]"/> </assert> </rule> </pattern> </schema>

In this case the pattern contains a single rule matching the colorkey element. The rule in turn defines some expressions for easier and clearer processing, and tests to see whether the @status of the incoming code is 200 (corresponding to an HTTP 200 "success" code). If not (that is, the web service returns an error) then the validator outputs the specifics of the message to the output-processing stream.

This approach—using something like Schematron as a declarative schema that nonetheless works well in a distributed context—should be examined more closely by those who work with XML data streams. In an increasingly connected world validation itself also needs to "go global," become more functional, and shift toward a processing model that recognizes that the days you could describe XML content in a single static document are slipping away quickly.

Kurt Cagle is the managing editor for XMLToday.org and a contributing editor for O'Reilly Media. He is currently working on a book about XBRL. Follow him on Twitter at twitter.com/kurt_cagle.
Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date