ML appears in some form in most modern applications?and often needs to be transformed from one form into another: merged, split, massaged, or simply reformatted into HTML. In most cases, it’s far more robust and efficient to use XSLT to perform such transformations than to use common programming languages such as Java, VB.NET, or C#. But because XSLT is an add-on rather than a core language, most developers use XSLT only occasionally, and have neither time nor resources to dive into the peculiarities of XSLT development or to explore the paradigms of functional and flow-driven programming that efficient use of XSLT requires.
Such occasional use carries the danger of abusing programming techniques suitable for mainstream languages such as Java, C and Python, but that can lead to disastrous results when applied to XSLT.
However, you can avoid the problems of occasional use by studying a few applications of different well-known programming problems to an XSLT programming task through this set of simple, thoroughly explained exercises.
An XSLT processor takes an XML document as input, processes it, and outputs the content in (usually) some altered form, such as XML, HTML, or text. Here’s a simple XML document that serves as the basis for the input examples in this article:
The document describes several books in a bookstore, providing the ISBN number, a language code, author, and title for each book.
Suppose you needed to extract all the book titles in the following form:
A flow-driven XSLT stylesheet example might look like this:
The stylesheet matches the root node right away (), and then enforces the control flow afterwards by pointing to each
The example above is somewhat incomplete as it does not give exactly the same output as the one defined in the problem definition. Indeed, once you launch it, the result is one long line resembling this:
To format it nicely, you have to add one more statement to the XSLT stylesheet:
The indent=”yes” activates the indentation. It is also wise to specify an output encoding explicitly, even though UTF-8 is the default encoding for XSLT.
Now, suppose you make the input file a bit more complex, introducing sections and rows to locate books more easily in the bookstore:
If you try to continue in the flow-driven way, the XSLT must grow considerably (and as you’ll see, needlessly) to adapt to the format change, adding templates to iterate over and process the
Fortunately, you can make the transformation much simpler by using matched templates. A matched template is one the XSLT processor triggers when its “match” attribute matches the current (context) node, whether that’s simply the name of a tag or a more complex XPath expression. For example, the processor will trigger the following template whenever the context node is a “lang” attribute (the ampersand denotes an attribute node rather than an element node).
This element has the follwing language id:
By processing the file through matched templates, the code makes as few assumptions as possible about the format of the input file. For example, the following stylesheet outputs exactly the same result for both input files, even though their hierarchical formats differ significantly. Here’s the revised stylesheet:
This event-driven version matches the root element?regardless of its name?by using the single backslash (/) syntax. Next, it outputs the root
If you apply this stylesheet to the second input file, you’ll get the following result:
The output is indeed the same as for the first input file, except for one minor annoyance. There are some gratuitous carriage returns before and after the
After trying to determine the cause of these extra carriage returns, an occasional XSLT programmer might just drop the simple event-driven approach altogether in favor of the more complex flow-driven one. But if you instead explore the XSLT specification, you’ll find a built-in template that copies text through and thus outputs the carriage returns:
In the example above, the carriage returns stem from the inside of the
To correct that, you can add one line to the event-driven stylesheet that matches text() nodes as follows:
That line gets rid of the carriage returns by overriding the built-in text template using a custom version that produces no output.
The key point to take away here is that almost any useful XSLT stylesheet should override at least two of the built-in templates: the one for text, shown above, and the one that matches all nodes, which is:
The built-in template for nodes copies nothing to the output, but by invoking the
|Author’s Note: You can gain fine-grained control over extra whitespace characters in the XSLT output by using the
Unlike most programming languages, XSLT does not favor sequential execution. This is manifested by the verbosity of the related language constructs such as switch and for-each, and by weak support of side-effects (no variables in the traditional sense)
This common example illustrates the verbosity of the imperative approach, which constructs an HTML table, placing the book names in rows and alternating colors on odd and even rows from the input document:
David Flannagan David Flannagan Dan Margulis
|Figure 1. Table with Alternating Colors: The figure shows how alternating red and blue rows of content might render in a browser.|
Figure 1 shows how a browser would render the preceding code.
Here’s how you can accomplish the task in the imperative style:
The stylesheet creates one table for each
As you can see, this processing method gets complex very quickly, and you’d need to alter it for every format alteration in the input XML file.
For XSLT, declarative is the opposite of the common imperative or algorithmic strategy; that is, an XSLT programmer does not define a sequence of actions that form an algorithm but rather sets a number of rules that the result should satisfy.
The declarative nature of the language lets you place templates anywhere and in any order in the XSLT document, because order has no impact on the resulting document.
|Author’s Note: The preceding rule applies except in cases of conflict resolution where order is the last decision criteria.|
Here is a stylesheet written with the declarative approach that provides the same output:
In contrast to the procedural approach, this version doesn’t define any algorithm. Instead, it specifies two templates for the processor to match: one for even-numbered rows and one for odd-numbered rows. The processor outputs the contents in red for even-numbered elements and in blue for odd-numbered elements.
Key Indexing in XSLT
You can simplify a fair portion of XSLT processing if you understand how to use keys. Keys in XSLT have more or less the same meaning that indexes have in relational databases, except that in XSLT, keys index hierarchical structure rather than relational structure. It’s easiest to explain keys with an example.
Imagine that you need to count the number of book copies available for each book title and display them in an HTML table, where each row looks like this:
Here’s a possible solution that illustrates the use of keys:
In the preceding example, the key declaration has three parts: the name of the key, used to refer to it later in the code, the match, that is, the element or attribute of the input data to be indexed, and the use which is an XPath expression that defines the key itself. XPath is a language for addressing parts of an XML document, designed to be used by XSLT and XPointer. See the full language specification for more information.
In this particular case, the expression
The “book” template uses the key by calling the function key() with two parameters: the name of the key and the value of the index as defined in the @use attribute of the key declaration?in this case, simply “title” as that’s the child of the context
That leads to another common XSLT problem: removing duplicates.
Removing Duplicates: the Muenchian Method
Because XSLT is an almost side-effect-free declarative language, the problem of removing duplicates?ridiculously simple in imperative languages such as C++ or Java?becomes overly complicated. But fortunately, an elegant solution exists, so unexpected that it even earned its own name, “Muenchian,” because Steve Muench was reportedly the first to discover it.
Notice that the key declaration in this example is identical to the previous example. You use the generate-id() function to obtain a unique id for each node, which ensures that every time you pass in the same
Using Complex Keys in XSLT
Because the use attribute of the key definition is an XPath expression, it’s possible to create quite elaborate indexes that rely upon complex XPath statements. As an example, generate-id() makes a unique key for every
The International Standard Book Number, or ISBN (sometimes pronounced “is-ben”), is a unique identifier for books, intended to be used commercially. The following declaration calculates the checksum of an ISBN number by returning true if the checksum passes the test and false otherwise.
You can find the check digit of an ISBN by first multiplying each digit of the ISBN by that digit’s place in the number sequence, with the leftmost digit being multiplied by 1, the next digit by 2, and so on. Next, take the sum of these multiplications and calculate the sum modulo 11, with “10” represented by the character “X”. As an example, for the ISBN 1-56592-235-2, the calculation would be (1*1 + 2*5 + 3*6 + 4*5 + 5*9 + 6*2 + 7*2 + 8*3 + 9*5) mod 11. The translate function deletes all the dash (-) characters from the @isbn attribute value. The following example uses the substring function to extract each character from the string returned by translate.
Branching vs. Modes in XSLT
XSLT’s branching powers are weak compared to the branching statements of conventional languages. Instead, you can use the powerful mechanism of modes?often unexplored by occasional XSLT programmers.
Suppose you have to print all the titles and their respective ISBN codes, checking for the ISBN code validity at the same time. You could represent the desired result as follows:
ISBN number check failed 1-56592-235-1
ISBN number check passed 1-56592-235-2 0-471-40399-7
Without knowing how to use keys and modes, you might implement the solution with the following stylesheet logic:
This example would construct the table by iterating on the ISBN nodes and choosing whether to output class=”color:red;” on each pass. This would be easy if you weren’t obliged to group the result and output all the failed ISBN codes first. For the purpose of grouping, the use of keys and modes leads to much simpler code, the alternatives being extension functions or chaining of two different XSLT stylesheets.
ISBN number check failed
ISBN number check passed
This version processes both types of book nodes?those that did not pass checksum verification for their ISBN codes, and those that did?using separate templates for the failed and passed modes. The stylesheet outputs ISBN values in red for books with ISBN codes that fail the checksum test.
Sometimes XSLT turns to be too lexically poor to do complex transformations. Two viable options then exist:
- Chain the execution of XSLTs instead of trying to do everything in one pass.
- Use common extension functions from the EXSLT package.
Chaining XSLT Execution
Contrary to what one might think, chaining XSLT stylesheets?using the output of one stylesheet transformation as the input for the next stylesheet in the chain?does not add much overhead if done in a proper way. Although nearly all XSLT processors reconstruct the structure of the input document in memory for each pass, that process is not equivalent to the reconstruction of a DOM tree. Most XSLT processors use an internal format that may be a lot faster. In fact, a number of small XSLT stylesheets chained together can actually boost performance as compared to a single complex stylesheet.
Using Common Extension Functions
There is an effort to provide a more or less common set of extensions to XSLT with the corresponding reference implementations. Some of these functions already exist in various XSLT processors under different names.
The most notable function is node-set. It allows the conversion of result tree fragments into node-sets. If you create a variable with a select statement, it returns a node-set:
In contrast, if you create a variable with an embedded statement, it returns a result tree fragment:
Because XSLT allows more operations on node-sets, it is wise to use the select statement when possible instead of embedded statements. Otherwise, the node-set extension function would come to the rescue.
The node-set extension function exists for several processors: 4XSLT, Xalan-J, Saxon, jd.xslt, and libxslt, and you make it accessible to your stylesheets by including the namespace http://exslt.org/common.
As a sign of EXSLT’s popularity, even Microsoft supports some of the EXSLT functions. However, Microsoft uses a different namespace: urn:schemas-microsoft-com:xslt.
Overall, as an occasional XSLT developer, try to keep the advantages of functional and flow-driven programming in mind?and be wary of falling into the trap of trying to use the procedural or imperative programming techniques that you commonly use in standard programming languages.