RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Transitioning from XSLT 1.0 to 2.0, Part 2 : Page 2

Lean which factors make XSLT 2 a technology worth exploring for your own development needs.

Beyond Saxon, a number of XSLT 2.0 engines have been produced within the last year, after the XSLT 2.0 specification became a formal W3C recommendation. IBM has produced an XSLT 2.0/XQuery engine for use with WebSphere, and it may be made standalone as part of the process (I need to confirm this with the project manager). Intel also recently released the Intel SOA Expressway XSLT 2.0 Processor, for use on Windows systems, as a standalone engine. Additionally, the open source eXist-db XML Database will be releasing a database-aware XSLT 2.0 native transformer with their next major release, probably early in 2011.

Significantly, in most cases these XSLT 2.0 engines are quite capable of handling XSLT 1.0 transformations -- while the data model is somewhat different, one of the considerations of the data model was that it would prove backwards compatible for most pure 1.0 transformations. Moreover, because the 1.0 "converters" typically makes use of significant optimizations in the way that nodes are processed, indexed and otherwise managed, out of the box it is likely that 1.0 stylesheets will show a marginal (5-10%) improvement in performance. However, as indicated above, many of the improvements with XSLT 2.0 involve converting unnecessary recursive processes (which can frequently be very thread intensive) into iterative processes which are typically not, freeing up memory and consequently improving processes with a fairly modest rewriting in code of the more problematic routines. Similarly, the incorporation of tokenizers, sequencers and regular expression functions can reduce the amount of processing that takes place via recursive template calls dramatically.

Data-type specification offers a second arena of savings as well as introduces a modicum of type safety. By being able to express type, functions can be readily compiled to be more efficient in their performance, even when originally written in XSLT 2.0. The modular approach to XPath 2.0 function design also has an impact, because it means that programming intensive extension functions (such as those associated with geospatial data) can be rendered in a much more hospitable language such as Java, C++ or C#.EXST

The primary caveat to conversions come with XSLT 1.0 scripts that made heavy use of extensions -- such as the node-set() extensions. In some cases, if the extension set involved was the one based on the EXSLT extension set, the conversion is likely to be very straightforward, as EXSLT wrapper functions rendered in XSLT 2.0 exist (many involving a 1-to-1 conversion, as the XPath 2.0 function set was in most cases derived from the EXSLT set). Again, the implementation of such "converted" stylesheets will likely not show anywhere near the improvement that would come from taking advantage of the XSLT 2.0 feature set from scratch, but especially in the case of large XSLT 1.0 libraries, the best approach is to start with incremental gains from the processors then rebuilding those stylesheets that form the primary bottlenecks in the system.

Such dedicated XSLT 2.0 development usually results in much smaller scripts (often 10-15% of the original size of a corresponding 1.0 script, if such a script could even be written at all). Code modularization, on the fly compilation and creation of "transformlets" that are effectly highly optimized processors can also improve throughput dramatically, as the time-consuming compilation process need only be performed once. This is also not counting general improvements in technology that most likely are now taking place at a much higher rate within 2.0 processors before being ported to 1.0 processors (if that -- in many cases the 1.0 processors have effectively stagnated as developers have moved onto other projects).

XSLT2 vs. XQuery

Both XSLT2 and XQuery share a common foundation in XPath 2.0, and those people familiar with XQuery may very well wonder why they should go to the bother of using XSLT2 if so much of what's offered there can just as readily be done with XQuery. While to a certain extent the difference is a matter of programming style (those familiar with transformations may prefer to use transformations over queries and vice versa) there are a few qualitative differences that make the issue less an either/or type of thing and more an appropriate tool discussion.

The templatized approach to working with XML documents recognizes the fact that certain operations are best done by "walking the tree", recursing from the initial document root node down and to the right until the whole tree is covered, and template matching actually works remarkably well in this particular situation. This typically means that document-centric XML content is still better handled by XSLT, especially if it has a particularly complex and largely unpredictable underlying schematic structure.

XSLT is also generally better for working with template content -- effectively mixing data content with an underlying template document through effective matches. XQuery can be used in this regard if the template is very consistent, but past a certain level of complexity such XQueries tend to be both slower to perform and requiring far more effort to write than the equivalent XSLT. (I saw this first hand with a project recently in which the XQuery code ended up running nearly 30% slower on average than an XSLT that generated the same content.

On the other hand, XQuery has been optimized for working with large-scale distributed data stores, and in that regard it is far better at retrieving and filtering content than XSLT is. What's more, for data-centric applications where structures are relatively well known and the need for intricate exception handling (which XSLT excels at) is near non-existent, XQuery can do a passable job at building presentation content as well from the original datasources.

What this implies in the long term is that rather than seeing the two as being diametrically opposed technologies, developers are beginning to see them as complementary ones -- using XQuery to handle the initial retrieval of XML content from collections to be handed off to XSLT functions responsible for transforming this content, either directly or via templates, into presentation formats. What's more the output need not necessarily be XML, but could be transformed content via secondary processors to generate output formats such as PDF, GoogleEarth's KML, ZIP files, binary resources and so much more, as well as different service formats -- Atom, RSS, JSON, YAML, and so forth.

It's likely that the longer term future of XSLT and XQuery will be in conjunction with the XML Pipelining Language (or XProc) as well as more generalized pipeline architectures such as those used by Mark Logic. The core objectives of each of these, along with validation, has remained the tripod on which most of the rest of the W3C's XML application stack from the beginning, but as applications become more distributed thinking of these operations as atomic makes it easier to break down complex sequences into far more manageable (and functional) components within a pipeline of operations.


XSLT 2.0 provides significant benefits, from improved performance to easier development to more sophisticated capabilities, over the XSLT 1.0 processor, and deployment options for using XSLT2, both as a stand-alone application and as part of a more comprehensive pipeline strategy, are increasing as vendors and project managers release their own XSLT2 engines. The benefits in code modularization as well makes it easier to develop component modules that are both flexible and powerful, reducing the amount of code rewriting and making it easier in general to organize transformation code for reuse.

Moreover, development of XSLT 1.0 processors has been slowing for a number of years as they have reached adequate levels of maturity, and it is unlikely that performance gains coming from better understanding of architecture, faster machines and more sophisticated development practices will necessarily be implemented in 1.0 processors, meaning that the momentum in development moving forward definitely favors the newer 2.0 implementations.

Now is a good time to put together pilot projects for use with XSLT 2.0 in order to get a better gauge for benefits that such processors have for your own transformation needs. Moreover, most XML development tools, such as XML Spy and OxygenXML, provide (and have provided for some time) very good support for XSLT 2.0 development, making such development a low-cost alternative for use with existing toolsets.

Email AuthorEmail Author
Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date