Processing XProc
As much as this might seem an open invitation to building Rube Goldberg-like solutions to architecture, it turns out
that this particular model is really quite powerful. For starters, there is functionally no distinction between a
large number of filters, each of which perform simple operations, and a single filter which performs very complex
operations (i.e., it's decomposable). All the state is carried in the balls and not the filters. Indeed, even "application environment" variables can be encoded as a bundle of state values and passed around in the same manner.
Finally, because of these two properties, a declarative pipeline architecture can be described as an
XML document itself, one that defines types of actions (such as transformations or queries), with the individual
files implementing those actions being other balls (or bundles) of state. An early, but classic,
example of this is Java Ant (or the .NET equivalent, Nant) which is a make file written in XML that is beginning to
replace the dominance of the C++ make command.
However, there's a new XML process language that's making some significant progress within the W3C. The XProc, or XML Pipeline Language, is designed as a way of describing a set of declarative processes, along with the inputs, outputs and throughputs of those processes, used within an XML pipeline. For instance, consider a simple pipeline for an XHTML document with enclosed XInclude statements for loading in other resources into the XHTML document, after which the resulting document needs to be validated (see
Figure 1).
 |
|
|
Figure 1. XInclude/Validate Pipeline: You need to validate the resulting document. |
The XProc specification for this particular document could be written as shown in
Listing 1.
In this particular example, the sequence as given includes two distinct parts.
The header section defines two key input "ports," source and schemas,
and one output "port," result. A port can be thought of as a named entrance or exit to the
XProc file, and is typically established by the implementation itself. For instance, if the variable
_source and the variable _schemas contained an XML DOM within
XInclude elements and an XML Schema document respectively, then an implementation of this XProc might look
something like this:
var proc = new XProc();
var _output = new Object;
proc.load("schema-proc.xml");
proc.setPort("source",_source);
proc.setPort("schema",_schema);
proc.setPort("result",_output);
proc.exec();
print(output);
Note, this code is just a theoretical interface, the specific implementation is up to the application provider.
When the Javascript procedure runs, the XProc is run in sequence (unless inclusions occur, in which case processing is
a little more complex). Thus, the first step, included, will take the content contained in
_source and will render any XIncluded links as their included content in the document.
This is the first xinclude step:
<p:xinclude name="included">
<p:input port="source">
<p:pipe step="xinclude-and-validate" port="source"/>
</p:input>
</p:xinclude>
The above code indicates that the source of the input is in fact the same as the source for the "xinclude-and-validate" input.
Not surprisingly, much of XProc is involved with establishing the sources and sinks of pipes in the pipeline.
One conceptual way of thinking about how such pipes work is that to this point there are two distinct sources:
xinclude-and-validate.source and included.source that happen in this case to be the same thing. However, in the
next block,
<p:validate-with-xml-schema name="validated">
<p:input port="source">
<p:pipe step="included" port="result"/>
</p:input>
<p:input port="schema">
<p:pipe step="xinclude-and-validate" port="schemas"/>
</p:input>
</p:validate-with-xml-schema>
The source input is defined as the result of the "included" step. If an output isn't defined for an XProc pipe
(i.e., for xinclude or validate-with-xml-schema in this particular
instance), then the port is assumed to be named "result," which is made explicit in the "validated" block. The
second input, the schema, is pulled from the explicitly named "schemas" port that was declared for the whole block.
With these two inputs, the validate-with-xml-schema can be run to validate the content.
If the document is in fact valid, then the post-schema-validated-infoset
(PSVI) is passed to the "result" port.
Note that the default behavior for validation with XML Schema (XSDL) varies somewhat from validation for
RelaxNG, in that an unvalidated copy is passed on rather than error messages, because the
result of an XSD validation is in fact a distinct (and different) object, rather than a simple Boolean flag.
However, an option can be added to this block to specify that the assertion that this is valid must be true.
If the validation fails, a dynamic error is called, and the XProc processor will either stop at that point or,
if it's defined will perform the <catch> action in a previously defined try/catch block.
Put more simply, XProc supports exception handling.
The output of this operation will then be the original XHTML document with XIncludes added in, and then run through an
XSD validator to return PSVI document. Notice that the process given is also basically generic. Both the source input
and the schema are parameters. If you change either of these (or, as a more sensible operation,
replace the schema validation with a transformation, and use two different transformation files) the results
will be different, but the XProc is the same.