Peek Into the Future of XSLT 2.0

xtensible Stylesheet Transformations, or XSLT, has received decidedly mixed reviews from developers. Although XSLT has many evangelists it has its detractors as well. Because of some concepts, such as static variables. XSLT’s learning curve can also be a bit steep, particularly for some of the more complex transformations.

Many of the problems stem from a fundamental lack of understanding about how the language works. Primarily through Mulberry’s XSL-List, the XSLT developer community has gone to great lengths to educate developers on how to use XSLT.

XSLT is beginning to take root into a large base across the enterprise. So what happens? Just as developers begin to get comfortable with XSLT, XSLT 2.0 and XPath 2.0 begin to creep up on them from around the corner.

And it’s fair to say that XSLT 2.0 is not your mother’s XSLT. In fact, XSLT 2.0, XPath 2.0, and XQuery are now so interconnected that if you master one, you can easily master them all. So when you’re done with this article, make a note to yourself to study XPath 2.0 when it receives formal recommendation status as a standard from the W3C.

This article will introduce you to some of the most interesting core concepts of XSLT 2.0 and its companion, XPath 2.0. While it doesn’t cover all the changes to these languages, it will introduce you to the most interesting new features. The brief example at the end of this article may help to tie some of this information together. To run the examples, you’ll need to download the sample code. You should also read the sidebar How to Get XSLT 2.0 for more information on how to get the latest version.

To run the samples, unzip the archive, making sure everything is in the same directory. Then launch Saxon.html from your browser. You’ll see four frames (see Figure 1). The top frame contains text boxes so you can specify the files you wish to transform. Type the name of the source XML file in the first text box, the name of the XSLT file in the second text box, and click the Transform button. The three lower frames then display the source document, the XSLT, and the transformation, respectively, displayed either as HTML source or as it should appear in your browser, depending on which radio button you choose when you click Transform. If you don’t see the text boxes, drag the lower edge of the top frame down with your mouse until they appear.

Figure 1: When you browse to the file Saxon.html, your browser should look something like this. You may need to adjust the height of the top frame to see the text fields.

What’s New in XSLT 2.0?
Like its predecessor XSLT 2.0 relies heavily on XPath (now XPath 2.0) for many of its core features. XPath 2.0 is itself intertwined with yet another emerging standard, XQuery 1.0, which relies on XPath 2.0 so much that after mastering XPath 2.0 you’ll have a pretty good idea how XQuery works. It’s important to note though that XSLT 2.0 does not rely on XQuery. XQuery is a language for querying XML documents and is already finding substantial support in most of the native XML databases such as XIndice, Ipedo, XHive, and others.

In addition, both Microsoft and Oracle plan to support XQuery in their next major releases, which will have native XML database capabilities. Eric Brown, Microsoft’s Product Manager for SQL Server, says that the next release of SQL Server, code-named Yukon, will support the following XML features:

  • Native XML Storage
  • XQuery Support
  • Cross domain querying between relational and XML data

Some developers are even suggesting that XQuery will supplant XSLT as the primary XML processing language. Other than their common bond to XPath, however, there is no direct relationship between XSLT 2.0 and XQuery.

Author Note: While reading this, you should be aware that XSLT 2.0 and XPath 2.0 are not yet stable specifications. Anything you find here is liable to change, some of it quite significantly.

XSLT 2.0 Adds New Data Types
In XSLT 1.0 (and XPath 1.0), there were four kinds of data types:

  • Strings
  • Booleans
  • Node-sets
  • Numbers

Node-sets, of course, contain nodes, which in turn contain some properties. There are seven types of nodes, the document, element, attribute, text, namespace, processing instruction, and comment nodes.

XPath 2.0 has a much richer data model. At the very top of the list is the sequence, which in addition to nodes can consist of XML Schema Language simple types such as xs:int or xs:date, and is equivalent to an ordered list. The addition of XML Schema data types is the biggest change. There are 16 simple data types available through Schema, and XQuery provides for functional access to all of them.

Everything starts with sequences, so let’s take a look at those, since it’s a new concept for many of us.

Understanding Sequences
The first object layer consists of something called sequences. It’s a new term but one you’ll need to get familiar with if you’re serious about the next version of XSLT and XPath. A sequence is a result of an expression. An expression, in turn, is constructed from a combination of keywords, symbols, and operands.

If you understand XML at the core level, you’ll understand exactly what a sequence is, and you may even slap yourself in the forehead and say, “wait, that’s not new at all?it’s the essence of XML!”. So besides looking at the XPath and XSLT definition for a sequence, consider the one fundamental truth behind XML and how best to define it. I won’t do it, I’ll let Mike Brown, through his excellent tutorial on XML and character encoding, do it for me.

An XML document is a UCS character sequence that follows certain patterns. These patterns provide a means of representing a logical hierarchy (a tree) of data.

That’s all XML is. I’m sure you’ve seen other definitions for XML, and while some of them may have various levels of truth behind them, if they don’t include something like this simple statement, they’re leading you astray. UCS, by the way, is, in essence, sort of, Unicode (not technically, but enough so for our purposes here), which is, sort of, the mother of all character coding sets for XML. I say “sort of” twice because the intricacies of Unicode are somewhat esoteric. If you really want to know more about it, you can’t do better than the Skew tutorial. So, what’s a sequence in XPath 2.0? Well, in the core functional capabilities of XPath 2.0, like XPath 1.0, are built around expressions. Expressions always yield results, and in XPath 2.0, these results are expressed as a sequence, which is an ordered list of zero or more items. These items can be either a node, as in XPath 1.0, or (and this is new) a simple XML Schema data type.

At its most basic, a sequence is the result of an expression like this:

(7, 1, 2, 3 )

This results in the following sequence:

   7123

This sequence contains all foo element children of the context node (which is the position from which evaluation starts):

   (foo, bar)

yields this:

   foobar

A sequence can be empty, so this:

   ()

yields an empty sequence.

Sequences in XPath 2.0 are ordered and never nested. They can also be duplicated within the scope of the same expression. For example, consider the source document shown in Listing 1. Suppose you want to extract some of the information from each product. To do that, here’s a simple for-each statement that outputs some literal results:

   


Starting with the source document’s node as the context node, the preceding for-each statement contains and expression that yields the following results (or sequence):

   

1
3
1

This example highlights the difference between the old and new XSLT models. For one thing, you couldn’t use commas as node delimiters in the previous version of XSLT or XPath. Now commas are a legitimate way to separate items in ordered sequences.

Note also the parentheses, a useful way to make the code more readable. But even more interesting is how easy it is to duplicate a node as part of the sequence. You can see that the first and third items in the expression are the same, and so they yield the same result in the output.

If you’ve ever been confused by the difference between what you can put in a match pattern (in, for example, an xsl:template match attribute), a good way to think about it is that matches never yield results like expressions do. They’re merely patterns for determining whether or not a node meets certain criteria.

XML Schema Support Is Controversial
One of the most interesting?if controversial?aspects to XSLT 2.0 is its support of XML Schema. The controversy is based on a concern among some XML developers that XML Schema is becoming too entrenched into the core of the language. But if you yearn for better data typing in XML you’ll appreciate the ability to use it, and if you loathe the concept entirely, you can safely ignore it as long as your requirements don’t dictate its use.

XPath 2.0 provides access to the XML Schema data types as defined by the W3C. To take advantage of this capability in XSLT, you use the xsl:import-schema element, which is a top level element new to XSLT 2.0, like this:

   

This gives you access to the data type model provided in XML Schema. You’ll need it if you want to do any explicit type casting, which is beyond the scope of this article but allows you to specify how you want data results returned to you (as decimal types, date types, date-time types, etc).

New Features Fulfill Developers’ Wishes
Multiple Document Output Capability
One feature for which XSLT developers have yearned is multiple output capability. Some processors already support multiple file output via extensions. Xalan, in particular, has a rather substantial pipelining process. This has proven useful for generating multiple Web site pages from one stylesheet. XSLT 2.0 now supports it natively through the xsl:result-document element. An example of this is shown in Listing 2 (the file output.xsl in the sample code). The stylesheet processor generates a result file for each instance of the product element, and also generates a link to each resulting file.

XSLT 2.0 Simplifies Syntax
XSLT has vastly cut down on its verbosity with Version 2.0. A perfect example is grouping, which can be a Herculean task in XSLT 1.0. Steve Muench of Oracle managed to piece together a brilliant mechanism for creating distinct groupings in XSLT 1.0, but making it work can sometimes make even the most grizzled XSLT veteran eat analgesics by the bucketful. I’ll walk you through a grouping example a bit later in this article, and you’ll see first-hand how much easier this common task has become.

Regular Expressions Improve String-Handling
The introduction of regular expressions is another improvement?and is something many XSLT developers have longed for. Most people find XSLT 1.0’s string manipulation capabilities wanting. Regular expressions change this completely.

More Robust Expressions
As nice as the simplified syntax is, even better is how robust XPath expressions have become. Consider the following statement, which harkens back to the kind of programming most of us are used to:

   

This statement declares a variable, stores it, and lets you drill down (or up) from the result of that variable?all within the same expression. The statement, in effect, says, “Declare the variable x, store the value resulting from evaluating the quantity element that is a following-sibling on the path from the context node, and then return the result.” You can do still more, such as access a child element of the result, by doing this instead:

   

The child elements are accessible in the preceding code because you have access to the sequence results from evaluating the variable expression.

Don’t get too excited, though, variables are still static. Nevertheless, there is a significant change in XSLT’s verbosity, and people who’ve shunned it for that reason will probably be taking a second look. Keep your mind trained on the preceding example, because later in this article you’ll see how you can use similar expressions to optimize your code.

The document() Function Migrates to XPath
The document() function has proven to be so useful in XSLT that it has found its way into the XPath 2.0 specification. You can now use it as part of a location path within XPath. If you’ve used this function in XSLT you understand the basics behind how it works.

The document() function in XSLT gives you access to all the nodes in a tree of an external document. Consider this variable:

   

The variable stores the entire tree from the XML document in foo.xml. Then, you can get at the rest of the tree. Assuming the root element is , you could get the text value of that element by doing this:

   

In fact, XPath has so many new functions added to it that the W3C split the specification. Functions now have their own standards document, currently in Working Draft stage.

Improved Comment Capability
You can still use comments the old fashioned way, by using the delimiters, but now you can embed them into expressions, too, using the delimiters {– and –} to start and end the comment.

   

New Operators, Keywords and Delimiters
In XPath 1.0 and XSLT 1.0 you couldn’t use parentheses like this:

   

Now you can. However, parentheses in XSLT 2.0 are really just a mechanism for making things more readable, not a way to manage nesting patterns as in object-oriented languages and arithmetic functions. Instead, they’re used as either a basic way to group sequences in XPath expressions like I demonstrated at the beginning of this article, or to simply make your expressions a little more readable. Other changes are less cosmetic. In addition to the for keyword that you’ve already seen, XSLT 2.0 adds a number of other keywords and constructs that programmers coming from other backgrounds will find familiar, such as in, and even an if-then-else construct:

           

Grouping
One of the laments of anyone who has developed complex transformations is grouping?an operation that should be fairly simple, but under XSLT 1.0 has been tedious at best and absolutely mind-numbing at worst.

Steve Muench’s solution to grouping problems has become known as the Muenchian method. To aid in grouping, XSLT 2.0 introduces an additional value for evaluation context called the current-group. This is a sequence of related items processed by the xsl:for-each-group element, and it is the key that unlocks that stubborn grouping door. The syntax for the element looks like this:

              

The for-each-group statement groups things the same way the Muenchian method does, but is less complex. For example, to group the costs of products in Listing 1 by their names in a way that would match the name children of product elements, you’d first make a key:

      

Then you’d access the key by calling the key function and cross reference it against the context node:

   xsl:for-each    select="name[count(. | key('costKey',$name)[1]) = 1]"

You can see how this works in Listing 5 at the end of this article, which shows the process in its entirety. I won’t go into details about how it works here because XSLT 2.0 makes it irrelevant, and there’s not enough space here for a primer on keys and grouping. If you need to learn more about keys and grouping for XSLT 1.0 projects, visit Jeni Tennison’s XSLT pages.

I’ll discuss grouping in more detail later, but to see the difference, compare the grouping solution in Listing 4 with that in Listing 5 (the first is XSLT 2.0, the second is XSLT 1.0).

Putting It All Together
It’s not fair to take a peek at all these new toys if you can’t play with them a bit. So here are some interesting examples that illustrate the new powers in XLST 2.0 and XPath 2.0. Listing 3 shows the source document used for the sample transformations. It’s essentially the same as Listing 1, but I’ve removed the schema to simplify adding an element to the file without worrying about validation.

Now, take a look at Listing 4. I’ve employed a few new XSLT 2.0 operations, so I’ll break down the code a bit. The highlighted XSLT 2.0 features appear in bold text.

The first interesting chunk of code concerns an old favorite of Perl programmers, regular expressions. I’ve decided to take Bert, the maker of all these fine products, out of the loop, so to speak, and hand the products over to a fellow named Tom. I do this not because I don’t like Bert, but because I want to show how regular expressions have made their way into XSLT 2.0. The following chunk of source XML shows a element:

               Bert's Coffee         1         3         3.4         8         2003-01-21      

The object is to transform the element in the input document…

   Bert's Coffee

…into the following HTML in the output:

   

Tom's Coffee

Trying Out a Regular Expression
In XSLT 1.0 the only way to replace a string of text is to use a recursive template, such as the one found here.

There’s nothing wrong with using such templates, of course, but it would be nice to have a simple function for it. Now, with the help of regular expressions, there is. Rather than calling a complex template function, you can write simple code like this:

      

The replace function takes three arguments. The first argument is the source string, the second is the part of the string that you want to replace, written as a regular expression, and the third is the replacement string or expression. The function replaces each instance of the target expression found in the source string with the replacement expression. I used simple literals for my regular expressions, but if you know regular expressions you can really do some serious string manipulations here.

Click here for more information on how to use regular expressions.

There are some differences between the way you use regular expressions in Perl and the way you use them in XML Schema. To explore regular expressions specifically as they pertain to XML Schema see http://www.xfront.org/xml-schema/.

Restricting Sequences
Another thing I wanted to do was effortlessly output the value of each element within each element. Again, this kind of thing is pretty easily accomplished in XSLT 1.0 once you’ve learned the language, but now it’s even easier because of one simple keyword named except. This keyword lets you restrict certain values from a specific sequence you are trying to generate. For example, to include all the following siblings of the name element, except the last one, you can write:

   select="(following-sibling::*) except following-sibling::*[5]"

The result outputs all the quantity elements to HTML:

      Sale: 1
Sale: 3
Sale: 3.4
Sale: 8

Note that in XSLT 1.0 you can accomplish the same thing with this select statement:

   select="following-sibling::*[not(position()=5)]"

Grouping in XSLT 2.0
The meat of this XSLT application lies in the product template, which takes advantage of XSLT 2.0’s new grouping feature:

                                                                     _____   
Total Qty:
Price Each:
Grand Total: $

If you look at the first two variables, name and Qty, you’ll see there’s nothing special happening there. It’s just good old fashioned XSLT 1.0 stuff. The next variable, cost, is a 2.0 variable, though, because it uses the new grouping element as well as a new comparative operator that takes some of the pain out of grouping problems.

In production environments you are sometimes handed some weird XML, and this is no exception. In this case, some gremlin decided to break out the elements separately, but still left them within the same document as the ordering information. This isn’t really too different than what we might find in the real world, however, in that case one might hope that the pricing info would be in a completely different document. Nevertheless, the grouping dilemma remains essentially the same.

The task here is to total the number of sales for each product and multiply that by the price of the product to determine the final sales value for each product. Although there are several ways to approach the problem, the first sample transformation uses the xsl:for-each-group element. But stay awake here! You’ve already seen how to declare a variable and access its value within the same expression, and you’ll see a similar example later that accomplishes precisely the same task as you’re about to see using the for-each-group statement, but in a simpler fashion.

To apply the Muenchian method to grouping problems you need to eliminate duplicates to create distinct nodes in your output. Generally, you do that by comparing a current node with a target node generated by a key and seeing if they’re identical. If they’re not, you execute a series of statements to develop your group.

If you examine Listing 5, which accomplishes the same grouping as Listing 4, you’ll see that I created a key at the top of the stylesheet. Then I had to compare the current node with the target (note the bolded code):

                                                                                                ...snip      

The code for grouping can get pretty gnarly, but now, with XSLT 2.0, it’s pretty simple, thanks to the new xsl:for-each-group element. Here’s a for-each-group statement:

   

The statement does pretty much what it says; it groups the elements by their child name elements which makes it easy to associate the elements with the names of the products being sold and simplifying the process of comparing the product names with the price names later. You can access the resulting sequence using the current-group() function. The sample code uses it to compare the names to the product elements using the product element’s name child. Conceptually, this is very similar to using keys, but I think you’ll find it more intuitive.

If you like what you see here, you might want to dig a little deeper. For more examples on grouping, check out http://www.w3.org/TR/xslt20/#d5e13310.

From this point, it’s just a matter of multiplying the results of the two variables.

Grouping Made Even Simpler
Now for the fun part. You can accomplish the same sort of grouping demonstrated in the last example with the for-each-group statement using only an expression. Here’s another way to group the costs:

   

Examine the expression in the select attribute and you can see that it’s the equivalent of the for-each-group statement shown earlier. You can find the code for this alternate grouping scheme in the file Listing5.xsl, which is included in the downloadable sample code for this article.

XSLT 2.0 addresses many of the concerns XSLT developers have raised. It turns grouping into a simple and directly manageable task, reduces the dependence on extensive recursive templates for string manipulation routines, deals with the issue of node-set access, and introduces the notion of multiple output documents, to name just a few.

Whether these new capabilities will result in broader usage remains to be seen. Although it’s natural to think that some developers will look at XQuery as a competing language for XML processing, I tend to think the opposite?that widespread adoption of XQuery will actually expose more people to XSLT, because they’re both so closely linked to XPath.

The two languages serve different purposes. XSLT was never intended to be a query language, although people have used it that way for lack of alternatives. XQuery is not intended to be a transformation language, but should see substantial use as a querying tool or application-level interface, particularly within the scope of native XML data sets in databases.

I haven’t covered every new feature of XSLT 2.0 in this article. For a more complete list, see Elliotte Rusty Harold’s slide presentation introducing XSLT 2.0, XPath 2.0, and XQuery.

   
Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

Recent Articles: