devxlogo

Occasional XSLT for Experienced Software Developers

Occasional XSLT for Experienced Software Developers

ML appears in some form in most modern applications?and often needs to be transformed from one form into another: merged, split, massaged, or simply reformatted into HTML. In most cases, it’s far more robust and efficient to use XSLT to perform such transformations than to use common programming languages such as Java, VB.NET, or C#. But because XSLT is an add-on rather than a core language, most developers use XSLT only occasionally, and have neither time nor resources to dive into the peculiarities of XSLT development or to explore the paradigms of functional and flow-driven programming that efficient use of XSLT requires.

Such occasional use carries the danger of abusing programming techniques suitable for mainstream languages such as Java, C and Python, but that can lead to disastrous results when applied to XSLT.

However, you can avoid the problems of occasional use by studying a few applications of different well-known programming problems to an XSLT programming task through this set of simple, thoroughly explained exercises.

An XSLT processor takes an XML document as input, processes it, and outputs the content in (usually) some altered form, such as XML, HTML, or text. Here’s a simple XML document that serves as the basis for the input examples in this article:

                  David Flannagan       JavaScript: The Definitive Guide                 David Flannagan       JavaScript: The Definitive Guide                 Dan Margulis       Photoshop 6 for Professionals        

The document describes several books in a bookstore, providing the ISBN number, a language code, author, and title for each book.

Flow-driven XSLT
Suppose you needed to extract all the book titles in the following form:

           JavaScript: The Definitive Guide     JavaScript: The Definitive Guide     Photoshop 6 for Professionals   

A flow-driven XSLT stylesheet example might look like this:

                            

The stylesheet matches the root node right away (), and then enforces the control flow afterwards by pointing to each node using the combination of the for-each construct and the call-template function.

The example above is somewhat incomplete as it does not give exactly the same output as the one defined in the problem definition. Indeed, once you launch it, the result is one long line resembling this:

      JavaScript: The Definitive Guide   JavaScript: The Definitive GuidePhotoshop 6 for Professionals

To format it nicely, you have to add one more statement to the XSLT stylesheet:

   

The indent=”yes” activates the indentation. It is also wise to specify an output encoding explicitly, even though UTF-8 is the default encoding for XSLT.

Now, suppose you make the input file a bit more complex, introducing sections and rows to locate books more easily in the bookstore:

           
David Flannagan JavaScript: The Definitive Guide David Flannagan JavaScript: The Definitive Guide Dan Margulis Photoshop 6 for Professionals

If you try to continue in the flow-driven way, the XSLT must grow considerably (and as you’ll see, needlessly) to adapt to the format change, adding templates to iterate over and process the

and elements:
                                                    

Event-driven XSLT
Fortunately, you can make the transformation much simpler by using matched templates. A matched template is one the XSLT processor triggers when its “match” attribute matches the current (context) node, whether that’s simply the name of a tag or a more complex XPath expression. For example, the processor will trigger the following template whenever the context node is a “lang” attribute (the ampersand denotes an attribute node rather than an element node).

   

By processing the file through matched templates, the code makes as few assumptions as possible about the format of the input file. For example, the following stylesheet outputs exactly the same result for both input files, even though their hierarchical formats differ significantly. Here’s the revised stylesheet:

                                        

This event-driven version matches the root element?regardless of its name?by using the single backslash (/) syntax. Next, it outputs the root tag, and instructs the stylesheet to continue the iteration over the contents of the current or context node (the root node in this case) with the apply-templates call.

If you apply this stylesheet to the second input file, you’ll get the following result:

                     JavaScript: The Definitive Guide         JavaScript: The Definitive Guide         Photoshop 6 for Professionals         

The output is indeed the same as for the first input file, except for one minor annoyance. There are some gratuitous carriage returns before and after the tags that cause the extra white space in the output.

After trying to determine the cause of these extra carriage returns, an occasional XSLT programmer might just drop the simple event-driven approach altogether in favor of the more complex flow-driven one. But if you instead explore the XSLT specification, you’ll find a built-in template that copies text through and thus outputs the carriage returns:

   

In the example above, the carriage returns stem from the inside of the

, , and tags of the input document, one for each tag.

To correct that, you can add one line to the event-driven stylesheet that matches text() nodes as follows:

    

That line gets rid of the carriage returns by overriding the built-in text template using a custom version that produces no output.

The key point to take away here is that almost any useful XSLT stylesheet should override at least two of the built-in templates: the one for text, shown above, and the one that matches all nodes, which is:

   

The built-in template for nodes copies nothing to the output, but by invoking the call, allows other templates to match children of the current tag. In other words, any XSLT stylesheet processes all the nodes in the input document by default.

Author’s Note: You can gain fine-grained control over extra whitespace characters in the XSLT output by using the and constructs in the stylesheet, or by using the xml:space attribute on XML tags in the input files.

Imperative XSLT
Unlike most programming languages, XSLT does not favor sequential execution. This is manifested by the verbosity of the related language constructs such as switch and for-each, and by weak support of side-effects (no variables in the traditional sense)

This common example illustrates the verbosity of the imperative approach, which constructs an HTML table, placing the book names in rows and alternating colors on odd and even rows from the input document:

      
David Flannagan
David Flannagan
Dan Margulis
 
Figure 1. Table with Alternating Colors: The figure shows how alternating red and blue rows of content might render in a browser.

Figure 1 shows how a browser would render the preceding code.

Here’s how you can accomplish the task in the imperative style:

                                                          

The stylesheet creates one table for each element in the input, so it first matches the row tag. Then, it uses the for-each construct to change the execution context to the first book node, calling the process-book template for each with a parameter that controls the row color in the HTML table. The process-book template then outputs the row, with either a red or a blue color depending on the value of the parameter, and calls itself to process the next book element with the opposite parameter value.

As you can see, this processing method gets complex very quickly, and you’d need to alter it for every format alteration in the input XML file.

Declarative XSLT
For XSLT, declarative is the opposite of the common imperative or algorithmic strategy; that is, an XSLT programmer does not define a sequence of actions that form an algorithm but rather sets a number of rules that the result should satisfy.

The declarative nature of the language lets you place templates anywhere and in any order in the XSLT document, because order has no impact on the resulting document.

Author’s Note: The preceding rule applies except in cases of conflict resolution where order is the last decision criteria.

Here is a stylesheet written with the declarative approach that provides the same output:

                                                                      

In contrast to the procedural approach, this version doesn’t define any algorithm. Instead, it specifies two templates for the processor to match: one for even-numbered rows and one for odd-numbered rows. The processor outputs the contents in red for even-numbered elements and in blue for odd-numbered elements.

Key Indexing in XSLT
You can simplify a fair portion of XSLT processing if you understand how to use keys. Keys in XSLT have more or less the same meaning that indexes have in relational databases, except that in XSLT, keys index hierarchical structure rather than relational structure. It’s easiest to explain keys with an example.

Imagine that you need to count the number of book copies available for each book title and display them in an HTML table, where each row looks like this:

   ...            JavaScript: The Definitive Guide       2        ...

Here’s a possible solution that illustrates the use of keys:

                                                                  

In the preceding example, the key declaration has three parts: the name of the key, used to refer to it later in the code, the match, that is, the element or attribute of the input data to be indexed, and the use which is an XPath expression that defines the key itself. XPath is a language for addressing parts of an XML document, designed to be used by XSLT and XPointer. See the full language specification for more information.

In this particular case, the expression literally means: Create a key with the name kbook on all the tags book and group them by title.

The “book” template uses the key by calling the function key() with two parameters: the name of the key and the value of the index as defined in the @use attribute of the key declaration?in this case, simply “title” as that’s the child of the context node. Quite expectedly, this stylesheet would produce two identical lines for the book “JavaScript: The Definitive Guide” as shown below.

      
JavaScript: The Definitive Guide 2
JavaScript: The Definitive Guide 2
Photoshop 6 for Professionals 1

That leads to another common XSLT problem: removing duplicates.

Removing Duplicates: the Muenchian Method
Because XSLT is an almost side-effect-free declarative language, the problem of removing duplicates?ridiculously simple in imperative languages such as C++ or Java?becomes overly complicated. But fortunately, an elegant solution exists, so unexpected that it even earned its own name, “Muenchian,” because Steve Muench was reportedly the first to discover it.

                                                                  

Notice that the key declaration in this example is identical to the previous example. You use the generate-id() function to obtain a unique id for each node, which ensures that every time you pass in the same node, you get the same ID. The ID value depends on which XSLT processor implementation you’re using, but typically, ID would be something like n1n1 or d1md1 or some other meaningless string. This example uses the key in a conditional expression that compares the ID of the current node with the ID of the first node returned by the key that matches the title of the current node. In other words, the key that matches “JavaScript: The Definitive Guide” returns two nodes ordered as 1 and 2. During execution, the template matching passes in those same two nodes. When processing node 1, the ID of the node is the same as returned by the key key(‘kbook’,’JavaScript: The Definitive Guide’)[1]; but when processing node 2, the condition is false. Thus, the stylesheet processes only one book that matches the title “JavaScript: The Definitive Guide.”

Using Complex Keys in XSLT
Because the use attribute of the key definition is an XPath expression, it’s possible to create quite elaborate indexes that rely upon complex XPath statements. As an example, generate-id() makes a unique key for every node.

      

The International Standard Book Number, or ISBN (sometimes pronounced “is-ben”), is a unique identifier for books, intended to be used commercially. The following declaration calculates the checksum of an ISBN number by returning true if the checksum passes the test and false otherwise.

You can find the check digit of an ISBN by first multiplying each digit of the ISBN by that digit’s place in the number sequence, with the leftmost digit being multiplied by 1, the next digit by 2, and so on. Next, take the sum of these multiplications and calculate the sum modulo 11, with “10” represented by the character “X”. As an example, for the ISBN 1-56592-235-2, the calculation would be (1*1 + 2*5 + 3*6 + 4*5 + 5*9 + 6*2 + 7*2 + 8*3 + 9*5) mod 11. The translate function deletes all the dash (-) characters from the @isbn attribute value. The following example uses the substring function to extract each character from the string returned by translate.

   

Branching vs. Modes in XSLT
XSLT’s branching powers are weak compared to the branching statements of conventional languages. Instead, you can use the powerful mechanism of modes?often unexplored by occasional XSLT programmers.

Suppose you have to print all the titles and their respective ISBN codes, checking for the ISBN code validity at the same time. You could represent the desired result as follows:

      
ISBN number check failed
1-56592-235-1
ISBN number check passed
1-56592-235-2
0-471-40399-7

Without knowing how to use keys and modes, you might implement the solution with the following stylesheet logic:

                            

This example would construct the table by iterating on the ISBN nodes and choosing whether to output class=”color:red;” on each pass. This would be easy if you weren’t obliged to group the result and output all the failed ISBN codes first. For the purpose of grouping, the use of keys and modes leads to much simpler code, the alternatives being extension functions or chaining of two different XSLT stylesheets.

                                                                             

This version processes both types of book nodes?those that did not pass checksum verification for their ISBN codes, and those that did?using separate templates for the failed and passed modes. The stylesheet outputs ISBN values in red for books with ISBN codes that fail the checksum test.

Extending XSLT
Sometimes XSLT turns to be too lexically poor to do complex transformations. Two viable options then exist:

  • Chain the execution of XSLTs instead of trying to do everything in one pass.
  • Use common extension functions from the EXSLT package.

Chaining XSLT Execution
Contrary to what one might think, chaining XSLT stylesheets?using the output of one stylesheet transformation as the input for the next stylesheet in the chain?does not add much overhead if done in a proper way. Although nearly all XSLT processors reconstruct the structure of the input document in memory for each pass, that process is not equivalent to the reconstruction of a DOM tree. Most XSLT processors use an internal format that may be a lot faster. In fact, a number of small XSLT stylesheets chained together can actually boost performance as compared to a single complex stylesheet.

Using Common Extension Functions
There is an effort to provide a more or less common set of extensions to XSLT with the corresponding reference implementations. Some of these functions already exist in various XSLT processors under different names.

The most notable function is node-set. It allows the conversion of result tree fragments into node-sets. If you create a variable with a select statement, it returns a node-set:

   

In contrast, if you create a variable with an embedded statement, it returns a result tree fragment:

           

Because XSLT allows more operations on node-sets, it is wise to use the select statement when possible instead of embedded statements. Otherwise, the node-set extension function would come to the rescue.

The node-set extension function exists for several processors: 4XSLT, Xalan-J, Saxon, jd.xslt, and libxslt, and you make it accessible to your stylesheets by including the namespace http://exslt.org/common.

                                  

As a sign of EXSLT’s popularity, even Microsoft supports some of the EXSLT functions. However, Microsoft uses a different namespace: urn:schemas-microsoft-com:xslt.

Overall, as an occasional XSLT developer, try to keep the advantages of functional and flow-driven programming in mind?and be wary of falling into the trap of trying to use the procedural or imperative programming techniques that you commonly use in standard programming languages.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist