devxlogo

Transitioning from XSLT 1.0 to 2.0, Part 2

Transitioning from XSLT 1.0 to 2.0, Part 2

The recursive nature of XSLT1 template calls provided a great deal of power, but at the cost of often requiring that stateful information that a particular template might need be passed in from an ancestor template through each intermediate template. A consequence of this was that deep templates might end up having to pass a great deal of information via parameters that were otherwise not used at all within any of the intermediate templates, adding considerably to the verbosity of XSLT scripts and increasing the probability that a single misplaced template parameter could end up resulting in many wasted hours trying to figure out why the recipient parameter was empty.

This example of template flows highlights a number of limitations that the original XSLT 1.0 transformations had, and these, unfortunately, generally couldn’t easily be handled via adding new XPath functionality, because it was fundamentally a problem with XSLT itself. For this reason, one of the more intriguing ideas that emerged was the concept of tunneling. In a tunnel, parameters are added to a template, with the attribute @tunnel=”yes”. In a different template, an or is called with having the same name and similarly sporting a @tunnel=”yes” attribute. For instance,

will convert the XML fragment:

     

User

    

The username is

into

     

User

    

The username is Kurt Cagle

The tunnel was called with parameters in template #1, passed transparently through templates #2 and #3, then retrieved the parameter for template #4 in order to populate the username . It's worth noting here that the tunneling is still in force -- if a descendent of template #4 sets up tunneling for the username, that template could still retrieve the value.

Another extraordinarily useful new feature with XSLT2 is the @use-when attribute. This attribute, when placed on any XSL element, will perform the Boolean test given within the attribute (in the current context of the element) and will then invoked the action of that element if the value is true. For instance, if you set up a debug flag at the top of the stylesheet, then any element with an associated @use-when will only be run if the debug flag is set to true:

If the $debug flag is true(), this will generate a list of the names of each of the child elements for the element, then will create a wrapper and apply-templates for all the children. If $debug is false(), only the wrapper is displayed.

The @use-when attribute can also apply to individual elements, but, the attribute has to incorporate the xsl: namespace prefix. For instance,

This is a debug message

will only be displayed if the $debug flag is properly set.

For templates, this is the logical successor to the @mode attribute, which made it possible to use the same pattern in different contexts but that suffered from being static. For instance, there was no clean way with @mode to create a "switch" like capability, with @use-when there is. What's more, the @use-when attribute makes it much easier to decompose template match statements, since you could effectively create two or more templates that would handle different but similar actions in small blocks rather than having to create huge, conditional statements within an XSLT2 stylesheet.

The makes it possible to works in a manner similar to but it works exclusively on imported stylesheets, even if they have the same match signature. With XSLT2, this also includes the ability to pass parameters to the imported templates, making them far more functional in nature. Apply-imports can consequently build in an XML-esque version of class inheritance into the XSLT space.

The element provides additional regular expression support by accepting a selection node-set and a @regex attribute containing a regular expression, then can use the and children in order to create template content appropriate to the expression. For instance, with the XML structure:

      How do you do? 

You can use the structure to split it cleanly into distinct parts.

                                    

This generates the output:

      How      do      you      do?

Note that you could do this with tokenize() as well, but it would take more code to write.

There are more capabilities that XSLT 2.0 offers (Ginzu knives and all!), but most of these tend to be either expansion of existing function sets (such as a whole range of date functions), optimization of template flow and organization or similar "tweaks" that could still knock off a few percent from processing time or number of lines of code necessary to perform transformations.

XSLT1 vs. XSLT2

While the advantages offered by XSLT 2.0 are sizeable compared to 1.0, it is still worth examining the costs of making such a transition. At this stage, arguably the best XSLT 2.0 engine is that produced by Saxonica, now in it's ninth iteration. The Saxon XSLT Processor has been improved continuously for the last dozen years, and exists now in both Java and .NET flavours, and because it's primary developer, Michael Kay, is also the editor of the XSLT 2.0 specification (and the upcoming 2.1 specification), Saxon has also become both the testbed and reference implementation for the W3C.Beyond Saxon, a number of XSLT 2.0 engines have been produced within the last year, after the XSLT 2.0 specification became a formal W3C recommendation. IBM has produced an XSLT 2.0/XQuery engine for use with WebSphere, and it may be made standalone as part of the process (I need to confirm this with the project manager). Intel also recently released the Intel SOA Expressway XSLT 2.0 Processor, for use on Windows systems, as a standalone engine. Additionally, the open source eXist-db XML Database will be releasing a database-aware XSLT 2.0 native transformer with their next major release, probably early in 2011.

Significantly, in most cases these XSLT 2.0 engines are quite capable of handling XSLT 1.0 transformations -- while the data model is somewhat different, one of the considerations of the data model was that it would prove backwards compatible for most pure 1.0 transformations. Moreover, because the 1.0 "converters" typically makes use of significant optimizations in the way that nodes are processed, indexed and otherwise managed, out of the box it is likely that 1.0 stylesheets will show a marginal (5-10%) improvement in performance. However, as indicated above, many of the improvements with XSLT 2.0 involve converting unnecessary recursive processes (which can frequently be very thread intensive) into iterative processes which are typically not, freeing up memory and consequently improving processes with a fairly modest rewriting in code of the more problematic routines. Similarly, the incorporation of tokenizers, sequencers and regular expression functions can reduce the amount of processing that takes place via recursive template calls dramatically.

Data-type specification offers a second arena of savings as well as introduces a modicum of type safety. By being able to express type, functions can be readily compiled to be more efficient in their performance, even when originally written in XSLT 2.0. The modular approach to XPath 2.0 function design also has an impact, because it means that programming intensive extension functions (such as those associated with geospatial data) can be rendered in a much more hospitable language such as Java, C++ or C#.EXST

The primary caveat to conversions come with XSLT 1.0 scripts that made heavy use of extensions -- such as the node-set() extensions. In some cases, if the extension set involved was the one based on the EXSLT extension set, the conversion is likely to be very straightforward, as EXSLT wrapper functions rendered in XSLT 2.0 exist (many involving a 1-to-1 conversion, as the XPath 2.0 function set was in most cases derived from the EXSLT set). Again, the implementation of such "converted" stylesheets will likely not show anywhere near the improvement that would come from taking advantage of the XSLT 2.0 feature set from scratch, but especially in the case of large XSLT 1.0 libraries, the best approach is to start with incremental gains from the processors then rebuilding those stylesheets that form the primary bottlenecks in the system.

Such dedicated XSLT 2.0 development usually results in much smaller scripts (often 10-15% of the original size of a corresponding 1.0 script, if such a script could even be written at all). Code modularization, on the fly compilation and creation of "transformlets" that are effectly highly optimized processors can also improve throughput dramatically, as the time-consuming compilation process need only be performed once. This is also not counting general improvements in technology that most likely are now taking place at a much higher rate within 2.0 processors before being ported to 1.0 processors (if that -- in many cases the 1.0 processors have effectively stagnated as developers have moved onto other projects).

XSLT2 vs. XQuery

Both XSLT2 and XQuery share a common foundation in XPath 2.0, and those people familiar with XQuery may very well wonder why they should go to the bother of using XSLT2 if so much of what's offered there can just as readily be done with XQuery. While to a certain extent the difference is a matter of programming style (those familiar with transformations may prefer to use transformations over queries and vice versa) there are a few qualitative differences that make the issue less an either/or type of thing and more an appropriate tool discussion.

The templatized approach to working with XML documents recognizes the fact that certain operations are best done by "walking the tree", recursing from the initial document root node down and to the right until the whole tree is covered, and template matching actually works remarkably well in this particular situation. This typically means that document-centric XML content is still better handled by XSLT, especially if it has a particularly complex and largely unpredictable underlying schematic structure.

XSLT is also generally better for working with template content -- effectively mixing data content with an underlying template document through effective matches. XQuery can be used in this regard if the template is very consistent, but past a certain level of complexity such XQueries tend to be both slower to perform and requiring far more effort to write than the equivalent XSLT. (I saw this first hand with a project recently in which the XQuery code ended up running nearly 30% slower on average than an XSLT that generated the same content.

On the other hand, XQuery has been optimized for working with large-scale distributed data stores, and in that regard it is far better at retrieving and filtering content than XSLT is. What's more, for data-centric applications where structures are relatively well known and the need for intricate exception handling (which XSLT excels at) is near non-existent, XQuery can do a passable job at building presentation content as well from the original datasources.

What this implies in the long term is that rather than seeing the two as being diametrically opposed technologies, developers are beginning to see them as complementary ones -- using XQuery to handle the initial retrieval of XML content from collections to be handed off to XSLT functions responsible for transforming this content, either directly or via templates, into presentation formats. What's more the output need not necessarily be XML, but could be transformed content via secondary processors to generate output formats such as PDF, GoogleEarth's KML, ZIP files, binary resources and so much more, as well as different service formats -- Atom, RSS, JSON, YAML, and so forth.

It's likely that the longer term future of XSLT and XQuery will be in conjunction with the XML Pipelining Language (or XProc) as well as more generalized pipeline architectures such as those used by Mark Logic. The core objectives of each of these, along with validation, has remained the tripod on which most of the rest of the W3C's XML application stack from the beginning, but as applications become more distributed thinking of these operations as atomic makes it easier to break down complex sequences into far more manageable (and functional) components within a pipeline of operations.

Summary

XSLT 2.0 provides significant benefits, from improved performance to easier development to more sophisticated capabilities, over the XSLT 1.0 processor, and deployment options for using XSLT2, both as a stand-alone application and as part of a more comprehensive pipeline strategy, are increasing as vendors and project managers release their own XSLT2 engines. The benefits in code modularization as well makes it easier to develop component modules that are both flexible and powerful, reducing the amount of code rewriting and making it easier in general to organize transformation code for reuse.

Moreover, development of XSLT 1.0 processors has been slowing for a number of years as they have reached adequate levels of maturity, and it is unlikely that performance gains coming from better understanding of architecture, faster machines and more sophisticated development practices will necessarily be implemented in 1.0 processors, meaning that the momentum in development moving forward definitely favors the newer 2.0 implementations.

Now is a good time to put together pilot projects for use with XSLT 2.0 in order to get a better gauge for benefits that such processors have for your own transformation needs. Moreover, most XML development tools, such as XML Spy and OxygenXML, provide (and have provided for some time) very good support for XSLT 2.0 development, making such development a low-cost alternative for use with existing toolsets.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist