RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Transition from XSLT 1.0 to XSLT 2.0, Part 1 : Page 2

Not surprisingly, adoption of XSLT has remained very limited among traditional developers.

You can also use the string-join() function which will take a sequence and concatenate it together, using a particular specified delimiter. For instace, a comma-delimited list could be generated from a sequence as

<xsl:value-of select="string-join($colors,', ')"/>

=> "red, orange, yellow, green, blue, violet"

User-Defined Functions

XSLT 1.0 is built around the template model, which includes both matching templates (that use the @match attribute) and named templates (that use the @name attribute). Named templates definitely have utility, but the central problem with such named templates is that they have to be invoked from within an statement, which both adds a lot of verbosity, especially when the resulting content is an atomic value (string, number or date), and cannot be invoked from within an XPath expression where it is most likely needed.

This need, along with the ambiguous state of extension functions within XSLT 1.0, led to an intriguing solution -- a way for XSLT2 (and XQuery) vendors to extend the language through a consistent interface. One impact of this interface was that with it you could consistently extend XSLT 2.0 with Java, .NET, PHP, or whatever other environment the processor was being hosted in at the time. However, the other, even more significant impact was that you could also define XPath extension functions directly using XSLT 2.0.

A common problem in web development in particular is creating a human-legible label for an XML element and it's associated text content. Frequently, XML elements are usually pretty descriptive, but aren't necessarily in a form that you'd like to see in a user interface - they contain underscores, dashes or use "camelCase" type notation. This can be a complex problem to solve in XSLT 1.0, but, with user defined functions and regular expressions (another useful feature of XSLT2), it can be made considerably easier - without resorting to complex recursion. For instance, suppose that you have a (rather ungainly) XML structure that looks like the following:

    <firstLine>This is the first line</firstLine>
    <second-Line>This is the second line</second-Line>
        <part_1>This is part 1 of line 3</part_1>
        <part-2>This is part 2 of line 3</part-2>

Then an XSLT 2.0 stylesheet to transform it would look like the following:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:str="http://www.xmltoday.org/xmlns/string-functions" exclude-result-prefixes="xs"

    <xsl:template match="/">
         <xsl:apply-templates select="*"/>

    <xsl:template match="*">
        <xsl:copy><xsl:attribute name="label" select="str:elt-human-case(.)"/><xsl:copy-of select="@*"/><xsl:apply-templates select="*|text()"/></xsl:copy>           

    <xsl:template match="text()"><xsl:copy/></xsl:template>

    <xsl:function name="str:elt-human-case" as="xs:string">
        <xsl:param name="elt"/>
        <xsl:variable name="elt-name" select="local-name($elt)"/>
        <xsl:variable name="expanded-caps" select="replace($elt-name,'([A-Z])','_$1')"/>
        <xsl:variable name="spaced-name" select="normalize-space(translate($expanded-caps,'-_','  '))"/>
        <xsl:variable name="tokenized-seq" select="tokenize($spaced-name,' ')"/>
        <xsl:variable name="title-name-seq"
            select="for $token in $tokenized-seq return concat(upper-case(substring($token,1,1)),lower-case(substring($token,2)))"/>
        <xsl:variable name="final-name" select="string-join($title-name-seq,' ')"/>
        <xsl:value-of select="$final-name"/>


The <xsl:function> element defines a user defined function, in this case elt-human-case(). This example breaks down the steps to make it a little easier to follow, but a production transformation might actually combine most of the variables into a single staggered definition. There are several points to note here. First, each function is defined in a namespace, in this case the user-defined "str:" namespace. This both makes it easier to modularize code blocks (this function could be one of several defined in an imported xslt, for instance, all in that namespace library) and insures that there's no overlap in functionality.

The parameterization is similar to the way that named templates are parameterized, though it's worth noting that both the function declaration itself and the parameters can also include an @as attribute which contains the simple-type declaration of the element, with a notation that is basically the same as used for XQuery.

The variable definitions are worth examining as well:
    1. The $elt-name variable extracts the local name of the element.

    2. The $expanded-caps variable uses a simple regular expression to replace any upper case character with an underscore and that character (e.g., "ATest") becomes "_A_Test". Regular expression support is global, but can also be extended to both ignore case and work across line breaks.

    3. Following that, the $spaced-name variable uses translate() to map underscores and dashes to spaces, and then applies normalize-space() to remove leading and trailing spaces and convert multiple contiguous spaces within the expression into a single space.

    4. In $tokenized-seq, the results are then tokenized to convert them into a sequence, and the sequence in turn is broken into title text expressions before being rejoined with spaces in $final-name. The output of the function ignores any white space outside of the <XSL:VALUE-OF> statement (and typically distinguishes between contained and external white space).

One of the major advantages that this approach offers is that the functions are compiled the first time they are encountered and consequently run considerably faster thereafter. Note that such functions can also return node content - elements and sequences of elements, though to return a sequence you need to include a * after the type-name; that is, a sequence of nodes would be given as:

 <xsl:function name="local:node-set-fn" as="node()*">

The impact that such functions have on code legibility and performance is significant -- named-template calls could often seem very cryptic, especially when these were called to produce either strings or attributes, and many of these tended to make subordinate recursive calls that could test the patience of any XSLT developer. While there are still cases where such recursion is the better approach, the ability to functionally declare such invocations and the ability opened up by better function sets and sequencing capabilities removes a lot of the recursion (and hence threading issues) that tend to afflict XSLT 1.0 code.

There's another effect of such functions. By providing an option to define functionality in XSLT it's possible to create semi-functional placeholder functions with XSLT2, then once the core behavior of a transformation is worked out, replace these XSLT functions with better optimized Java or other host language extensions. Controlling Multiple Output Streams

Once you make the transition from a node-set to a sequential data model (and get rid of the singularly useless XML Fragment object as a consequence) one of the more interesting side effects is the ability to use XSLT2 to create intermediate XML (and non-XML) content. Once that ability existed, the developers of the XSLT specification (and most pointedly Michael Kay, the editor of the specification and the creator of the Saxon XSLT2 processor, discussed below) took advantage of this capability to create a new element called <xsl:result-document>, which made it possible to send the content to an external URL -- whether a file on the local directory, a URL expecting POSTed content, a SOAP service capable of accepting a POST endpoint and so forth.

This capability opens up all kinds of interesting potential for XSLT2 processors. For instance, one of the more useful applications for <xsl:result-document> is the ability to load in a large XML file and split it up into multiple smaller documents that can be saved locally. Another use is to load in a syndication feed in RSS2 or Atom, read each of the entries in the feed, extract the links to the resources then load the resources into a local directory. Yet another use is as a way to process a message queue, with each message in turn being sent off to a web API for additional processing elsewhere. For instance, suppose that you had an XML feed that looked like the following:

<feed path="http://www.resources.com/">
          <title>Resource 1</title>
          <title>Resource 2</title>
          <title>Resource 3</title>

The following transformation takes the feed, retrieves the entry resources and stores them on the local file system:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

    <xsl:output method="xml" media-type="text/xml" indent="yes"/>
    <xsl:output name="xml-out" method="xml" media-type="text/xml" indent="yes" encoding="UTF-8"/>

    <xsl:template match="/">      
        <xsl:apply-templates select="feed"/>

    <xsl:template match="feed">
        <xsl:variable name="path" select="string(@path)"/>
        <xsl:for-each select="entry">
            <xsl:variable name="doc" select="doc(concat($path,link))"/>
            <xsl:result-document href="file:///home/xmltoday/{link}" format="xml-out">
                <xsl:copy-of select="$doc"/>
        <count><xsl:value-of select="count(entry)"/></count>

The <xsl:result-document> element within the "feed" match template illustrates the principle. The @href attribute points to a local file with the same name as the external resource. The @format attribute references a named <xsl:output> element (also new in 2.0), making it possible for the result-document output to be formatted in a different manner than is used for output by the main XSLT. The content in this case is the document retrieved from each entry, though it could be more complex code (or xsl:apply-templates elements referencing other templates).It should be noted that the output of this result document is always an empty sequence with respect to the main document - the result of the document in this case is the <count> element in the last line of the feed template.

There are a few limitations that you have to be careful with for this particular element -- once you start generating output elements in the main thread, you can't include an <xsl:result-document> element until the primary output stream terminates. This usually means that using XSLTs in this manner you often generate relevant code content that may need to be used in more places within variables, then pass the variables into both the subordinate output documents then into the primary output document.

Kurt Cagle is the managing editor for XMLToday.org and a contributing editor for O'Reilly Media. He is currently working on a book about XBRL. Follow him on Twitter at twitter.com/kurt_cagle.
Email AuthorEmail Author
Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date