An Inventory of Steps
A number of "keyword" elements act as steps within XProc, including the following:
- <pipeline>: Defines a named set of pipes, wrapped up into a single referenceable object.
- <for-each>: This is a compound statement, wherein a given set of documents (or nodes within a document)
are passed, one at a time, to a given sub-pipeline and processed.
- <viewport>: A viewport is similar to a for-each block, in that for a given input document, it retrieves a set
of nodes from a given XPath, applies a subpipeline to those nodes and replaces the original nodes with the results
of the subpipeline.
- <choose>: Chooses a particular pipeline among several based upon constraints in the source document, then
executes the subpipeline on that input. If nothing matches, a default operation is available.
- <groupara>: A group is something of a convenience step for subpipelines, making it easier to identify
subgroups and apply just that subgroup of pipes.
- <try><catch>: Provides an exception handling mechanism when a process throws an exception.
These commands make it possible to handle fairly complex logic, including conditional processing, enumerations, exception handling and the like, just as analogous keywords make it possible to create complex procedural logic.
A second set of pipes handle specific operations. The XProc specification lists a number of them:
- <add-attribute>: Adds an attribute name and value to specific elements.
- <add-xml-base>: Includes a reference to the absolute URL of the process, rather than a relative one.
- <compare>: Compares two documents for equality, passing the document if true and otherwise raising an
- <count>: Counts the number of documents in the input source.
- <delete>: Deletes any item from a source document that satisfies a given XPath pattern.
- <directory-list>: Retrieves an XML structure containing the files and directories of a given directory in a
file or related system.
- <error>: Actually generates a custom dynamic error on the given input document.
- <escape-markupara>: This serializes the selected sub-elements in a given XML document.
- <http-request>: Performs an XMLHttpRequest to either send or retrieve content from the web.
- <identity>: Makes a verbatim copy of the input as an output.
- <insert>: Inserts a block of XML into a source document at the matching position(s) specified.
- <label-elements>: Creates a label for a matched element and stores it in the specified attribute.
- <load>: Loads a document the URL of which is specified in a parameter.
- <make-absolute-uris>: Converts an element or attribute's value in the source document into a uniform resource
identifier (URI) in the target document.
- <namespace-rename>: Changes the namespace in a given document to a new URI, useful when working with proxy
- <pack>: Combines two documents into a single document with a new wrapper element containing them.
- <parameters>: Passes a set of parameters with associated values to a given step.
- <rename>: Renames a given element or attribute.
- <replace>: Replaces an element and its children with a different element and its children.
- <set-attributes>: Sets the attributes on matched elements.
- <sink>: Accepts a set of documents and discards them. This primarily occurs when the role of the preceding
pipeline was to perform out-of band operations.
- <split-sequence>: Takes a sequence of documents and splits that into two sequences.
- <store>: Serializes the content of the input document and persists the output to the given URI.
- <string-replace>: Replaces all instances of a given string in the source with replacement text in the result.
Note that if XPath 2.0 is used, this will likely support regular expression replacements.
- <unescape-markupara>: Converts serialized text and parses (converts) the text in XML.
- <unwrapara>: Replaces matched elements with their children.
- <wrapara>: Replaces any elements that match the given expression with a wrapper element. If two or more
matched elements are adjacent, then will wrap all of the adjacent nodes under a single wrapper.
- <wrap-sequence>: Wraps a sequence of elements with a wrapper element.
- <xinclude>: Includes the content specifed in an element into the source document to replace that
- <xslt>: Performs an XSLT transformation on the given source document.
These are all considered required for XProc operations. In addition to these, the specification also defines a set of 10 additional optional steps that don't have to be implemented, but if they are implemented should take a given form:
- <exec>: Executes an external command using the arguments specified.
- <hash>: Creates a digital hash of the corresponding source.
- <uuid>: Generates a unique global identifier.
- <validate-with-relax-ng>: Validates the document using the Relax NG Schema Language.
- <validate-with-schematron>: Validates the document using the Schematron Schema Language.
- <validate-with-xml-schema>: Validates the document using the XML Schema Definition Language (XSDL).
- <www-form-urldecode>: Decodes a URL-encoded string into parameters.
- <www-form-uelencode>: Encodes a set of parameters as a form-url-encoded string.
- <xquery>: Calls an XQuery script on the source and retrieves the result.
- <xsl-formatter>: Takes content rendered in XSL-FO and generates output in PDF or similar formats.
Obviously, given the range of commands given here, its possible to create very complex pipeline operations just with
the standard set of commands, especially given the meta-nature of pipelines. It's also worth noting the use of the
command that is designed to make system calls to the underlying operating system, using the source document as the first input parameter and sending the output to the result port. Obviously, this particular command may not be available in a non-secured environment.
Beyond this core set of pipeline "steps," the XProc specification also provides extension mechanisms both for the underlying XPath used to select nodes and for defining additional steps within the XPRoc processor. For instance, one such step might be to provide a command to send the associated content as an email message to a given email list, while another might perform authentication against an LDAP server to ensure that the rest of the pipeline can proceed.