RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


XProc: Meta-Programming and Rube Goldberg : Page 2

XProc, the XML Pipeline Language, is designed as a way of describing a set of declarative processes. Learn how XProc neatly solves a number of problems that tend to transcend working with any one single XML operational language.

Processing XProc
As much as this might seem an open invitation to building Rube Goldberg-like solutions to architecture, it turns out that this particular model is really quite powerful. For starters, there is functionally no distinction between a large number of filters, each of which perform simple operations, and a single filter which performs very complex operations (i.e., it's decomposable). All the state is carried in the balls and not the filters. Indeed, even "application environment" variables can be encoded as a bundle of state values and passed around in the same manner.

Finally, because of these two properties, a declarative pipeline architecture can be described as an XML document itself, one that defines types of actions (such as transformations or queries), with the individual files implementing those actions being other balls (or bundles) of state. An early, but classic, example of this is Java Ant (or the .NET equivalent, Nant) which is a make file written in XML that is beginning to replace the dominance of the C++ make command.

However, there's a new XML process language that's making some significant progress within the W3C. The XProc, or XML Pipeline Language, is designed as a way of describing a set of declarative processes, along with the inputs, outputs and throughputs of those processes, used within an XML pipeline. For instance, consider a simple pipeline for an XHTML document with enclosed XInclude statements for loading in other resources into the XHTML document, after which the resulting document needs to be validated (see Figure 1).

Figure 1. XInclude/Validate Pipeline: You need to validate the resulting document.
The XProc specification for this particular document could be written as shown in Listing 1.

In this particular example, the sequence as given includes two distinct parts. The header section defines two key input "ports," source and schemas, and one output "port," result. A port can be thought of as a named entrance or exit to the XProc file, and is typically established by the implementation itself. For instance, if the variable _source and the variable _schemas contained an XML DOM within XInclude elements and an XML Schema document respectively, then an implementation of this XProc might look something like this:

var proc = new XProc();
var _output  = new Object;

Note, this code is just a theoretical interface, the specific implementation is up to the application provider.

When the Javascript procedure runs, the XProc is run in sequence (unless inclusions occur, in which case processing is a little more complex). Thus, the first step, included, will take the content contained in _source and will render any XIncluded links as their included content in the document. This is the first xinclude step:

<p:xinclude name="included">
    <p:input port="source">
        <p:pipe step="xinclude-and-validate" port="source"/>

The above code indicates that the source of the input is in fact the same as the source for the "xinclude-and-validate" input. Not surprisingly, much of XProc is involved with establishing the sources and sinks of pipes in the pipeline. One conceptual way of thinking about how such pipes work is that to this point there are two distinct sources: xinclude-and-validate.source and included.source that happen in this case to be the same thing. However, in the next block,

<p:validate-with-xml-schema name="validated">
    <p:input port="source">
        <p:pipe step="included" port="result"/>
    <p:input port="schema">
        <p:pipe step="xinclude-and-validate" port="schemas"/>

The source input is defined as the result of the "included" step. If an output isn't defined for an XProc pipe (i.e., for xinclude or validate-with-xml-schema in this particular instance), then the port is assumed to be named "result," which is made explicit in the "validated" block. The second input, the schema, is pulled from the explicitly named "schemas" port that was declared for the whole block. With these two inputs, the validate-with-xml-schema can be run to validate the content. If the document is in fact valid, then the post-schema-validated-infoset (PSVI) is passed to the "result" port.

Note that the default behavior for validation with XML Schema (XSDL) varies somewhat from validation for RelaxNG, in that an unvalidated copy is passed on rather than error messages, because the result of an XSD validation is in fact a distinct (and different) object, rather than a simple Boolean flag. However, an option can be added to this block to specify that the assertion that this is valid must be true. If the validation fails, a dynamic error is called, and the XProc processor will either stop at that point or, if it's defined will perform the <catch> action in a previously defined try/catch block. Put more simply, XProc supports exception handling.

The output of this operation will then be the original XHTML document with XIncludes added in, and then run through an XSD validator to return PSVI document. Notice that the process given is also basically generic. Both the source input and the schema are parameters. If you change either of these (or, as a more sensible operation, replace the schema validation with a transformation, and use two different transformation files) the results will be different, but the XProc is the same.

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date