Transition from XSLT 1.0 to XSLT 2.0, Part 1

Transition from XSLT 1.0 to XSLT 2.0, Part 1

The year 2007 saw the release of what was, for all intents and purposes, the second generation of the XML core technologies. The first generation appeared over the course of about three years, from 1998 for the XML standard itself and 1999 for XPath and XSLT 1.0 to 2001 for the release of XML Schema. XSLT 1.0 in particular was a game changer for the technology, as it took a radically different approach to programming — creating a language written using nothing but markup that attempted to match XPath patterns and then, passing the XML nodes in question to a template to create new content.

This approach was extraordinarily powerful — because of the recursive nature of the templates, an XSLT stylesheet could transform anything from nearly flat database records to very deep documents with equal ease, could use wildcard matches to create generalized templates, and could additionally invoke functions on the XPath for more specialized processing. This meant that XSLT stylesheets began to take on the status of a secret weapon for XML developers, programs that could, in a few hundred lines of code, outdo imperative code double or triple its size.

However, XSLT 1.0’s power was also its achilles heel. As one pundit put it, the process of learning how to work with XSLT was very much akin to sawing off the top of your head, pulling out your brain, rotating it 90 degrees, then inserting it back in your head. For people steeped in C-like languages, XSLT was extraordinarily non-intuitive, and there were many operations, such as iterating over a sequence of numbers, that could only be accomplished by using recursion rather than iterator loops. Not surprisingly, especially when coupled with the sometimes cumbersome XML notation, adoption of XSLT remained very limited among traditional developers.

Yet even among XML developers who recognized the value of XSLT, the language had a reputation for being cumbersome, and a number of efforts emerged to improve the standard via a somewhat vague and amorphous extension mechanism, culminating in 2002 with the creation of a somewhat ad hoc group of functions defined under the banner of Extended XSLT (EXSLT). While these functions helped to identify (and standardize) a number of key functions (including additional math, string and regular expression handling, as well as a significant hole in date processing), it also identified a more serious problem, one that actually went to the core of the underlying XPath language.

It’s useful to understand the relationship between XPath and XSLT. The purpose of XPath is very much analogous to the role played by a SELECT statement in the SQL — it identified, relative to a given context in an XML document, a group of related nodes – elements, attributes, text and so forth — for further processing. The XPath language also had a secondary role in being able to performs some limited calculations on those nodes, though this capability existed primarily to support the underlying selection mechanism.

XSLT in turn made use of both the primary and secondary roles of XPath by providing a constructive language capability over the top of the XPath language. There are, in face a number of similarities between the XPath language and the Regular Expression (RegEx) language — one identifies nodes in a tree, while the other identifies text fragments in a text expression, but neither of them can actually do anything with this information by themselves — they aren’t constructive. Languages such as Perl or java-script could take the resulting pattern matches and use them to build other things in the text by way of the matches. XSLT works in much the same way, something that tends to get lost in translation when people see it as an XML language.

This focus though on XPath nodes in a nodeset, however, was proving to be too limited. There was no clean way of iterating over sets of words, for instance, or iterating over a sequence of numbers, or — and this was perhaps the most troublesome aspect — iterating across constructed nodes, because XPath assumed that there was effectively one and only one tree that it could pull information from, the one tree provided as the input (or passed as a parameter). The biggest consequence of this was that there was no way with this underlying data model to create a temporary tree and then process that tree in memory, short of outputting that tree and passing it onto a new XSLT process. You couldn’t combine two or more external inputs then process that combined set, something that is especially critical for working with external look-up tables. In short, the model itself was broken, badly.

This led to a decision by the W3C to go back to the drawing board, especially as there had been an increasing call by XML database vendors to provide a more comprehensive standardized query language that would do for XML what SQL did for relational databases – create a unified query language that could be used to query large datasets, return results, and manipulate them in different ways to generate new output. In 2002, the XPath working group was given a mandate to produce a 2.0 version of the language, one that could be used by both a revised XSLT language and by the proposed XQuery language … and thus began a major rethinking about data models.

From Nodesets to Sequences

The new data model (which began emerging around 2004) no longer revolved around nodesets, which were seen as being too restrictive. Instead, primacy shifted to sequences — lists of things, with comparatively little restriction on what those things could be. A sequence could consist of a linear set of nodes, for instance, but those nodes no longer were required to be all a part of the same document. The sequence could also consist of text strings, numbers, dates and times (which were now their own data type), or even more generic items — or could be combinations of all of these things.

Yet sequences were also, always, fundamentally linear. If you put a sequence inside of another sequence such as:

(a,b,(c,d,e))

that was equivalent to

(a,b,c,d,e)

Thus, you couldn’t build structures with nested sequences, because the whole role of a sequence is to be a, well, sequential list of items.

This seemingly mundane change would end up having huge implications. You could create operators such as the to operator, which lets you create iterations:

    

It meant that you could take a comma delimited string and split it into individual strings in a sequence:

          

Finally, and perhaps most significantly, you could pull in elements (or even documents) from multiple document sources and combine them:

          
  • Most of these operations are familiar to people working with XQuery (because sequences are so intrinsic to XQuery development), and once you gain some mastery in working with sequences in this manner you can do a lot of things that are difficult and tedious to do in XSLT 1.0 in a couple of lines of code. Such sequences also share a fair amount of similarity with node-sets in terms of their interface -- $mysequence[2] will retrieve the second item in the sequence, $mysequence[last()] will retrieve the last element, and $mysequence/count() or count($mysequence) (both forms are valid) will retrieve the number of items in the sequence.

    However, there are also capabilities that sequences offer that can't be readily handled by nodesets in XSLT 1. For instance, you can slice sequences using the subsequence command -- subsequence($mysequence, 1,4) will return the first four items in the sequence, subsequence($mysequence, 3, 4) will return four sequential items starting from item 3, and subsequence($mysequence,2) will retrieve all items from the second item on.

    Similarly, you can use the index-of function to match a given sequence item with its position or positions. For instance,

    will return the value 5. If more than one match exists, then a sequence such as (3,4) would be returned, making it possible to iterate on those subelements:

              
  • which would generate a sequence of elements:

  • green
  • ,

  • blue
  • .You can also use the string-join() function which will take a sequence and concatenate it together, using a particular specified delimiter. For instace, a comma-delimited list could be generated from a sequence as

    => "red, orange, yellow, green, blue, violet"

    User-Defined Functions

    XSLT 1.0 is built around the template model, which includes both matching templates (that use the @match attribute) and named templates (that use the @name attribute). Named templates definitely have utility, but the central problem with such named templates is that they have to be invoked from within an statement, which both adds a lot of verbosity, especially when the resulting content is an atomic value (string, number or date), and cannot be invoked from within an XPath expression where it is most likely needed.

    This need, along with the ambiguous state of extension functions within XSLT 1.0, led to an intriguing solution -- a way for XSLT2 (and XQuery) vendors to extend the language through a consistent interface. One impact of this interface was that with it you could consistently extend XSLT 2.0 with Java, .NET, PHP, or whatever other environment the processor was being hosted in at the time. However, the other, even more significant impact was that you could also define XPath extension functions directly using XSLT 2.0.

    A common problem in web development in particular is creating a human-legible label for an XML element and it's associated text content. Frequently, XML elements are usually pretty descriptive, but aren't necessarily in a form that you'd like to see in a user interface - they contain underscores, dashes or use "camelCase" type notation. This can be a complex problem to solve in XSLT 1.0, but, with user defined functions and regular expressions (another useful feature of XSLT2), it can be made considerably easier - without resorting to complex recursion. For instance, suppose that you have a (rather ungainly) XML structure that looks like the following:

        This is the first line    This is the second line            This is part 1 of line 3        This is part 2 of line 3    

    Then an XSLT 2.0 stylesheet to transform it would look like the following:

                                                                                                                            

    The element defines a user defined function, in this case elt-human-case(). This example breaks down the steps to make it a little easier to follow, but a production transformation might actually combine most of the variables into a single staggered definition. There are several points to note here. First, each function is defined in a namespace, in this case the user-defined "str:" namespace. This both makes it easier to modularize code blocks (this function could be one of several defined in an imported xslt, for instance, all in that namespace library) and insures that there's no overlap in functionality.

    The parameterization is similar to the way that named templates are parameterized, though it's worth noting that both the function declaration itself and the parameters can also include an @as attribute which contains the simple-type declaration of the element, with a notation that is basically the same as used for XQuery.

    The variable definitions are worth examining as well:
      1. The $elt-name variable extracts the local name of the element.

      2. The $expanded-caps variable uses a simple regular expression to replace any upper case character with an underscore and that character (e.g., "ATest") becomes "_A_Test". Regular expression support is global, but can also be extended to both ignore case and work across line breaks.

      3. Following that, the $spaced-name variable uses translate() to map underscores and dashes to spaces, and then applies normalize-space() to remove leading and trailing spaces and convert multiple contiguous spaces within the expression into a single space.

      4. In $tokenized-seq, the results are then tokenized to convert them into a sequence, and the sequence in turn is broken into title text expressions before being rejoined with spaces in $final-name. The output of the function ignores any white space outside of the statement (and typically distinguishes between contained and external white space).

    One of the major advantages that this approach offers is that the functions are compiled the first time they are encountered and consequently run considerably faster thereafter. Note that such functions can also return node content - elements and sequences of elements, though to return a sequence you need to include a * after the type-name; that is, a sequence of nodes would be given as:

     

    The impact that such functions have on code legibility and performance is significant -- named-template calls could often seem very cryptic, especially when these were called to produce either strings or attributes, and many of these tended to make subordinate recursive calls that could test the patience of any XSLT developer. While there are still cases where such recursion is the better approach, the ability to functionally declare such invocations and the ability opened up by better function sets and sequencing capabilities removes a lot of the recursion (and hence threading issues) that tend to afflict XSLT 1.0 code.

    There's another effect of such functions. By providing an option to define functionality in XSLT it's possible to create semi-functional placeholder functions with XSLT2, then once the core behavior of a transformation is worked out, replace these XSLT functions with better optimized Java or other host language extensions.Controlling Multiple Output Streams

    Once you make the transition from a node-set to a sequential data model (and get rid of the singularly useless XML Fragment object as a consequence) one of the more interesting side effects is the ability to use XSLT2 to create intermediate XML (and non-XML) content. Once that ability existed, the developers of the XSLT specification (and most pointedly Michael Kay, the editor of the specification and the creator of the Saxon XSLT2 processor, discussed below) took advantage of this capability to create a new element called , which made it possible to send the content to an external URL -- whether a file on the local directory, a URL expecting POSTed content, a SOAP service capable of accepting a POST endpoint and so forth.

    This capability opens up all kinds of interesting potential for XSLT2 processors. For instance, one of the more useful applications for is the ability to load in a large XML file and split it up into multiple smaller documents that can be saved locally. Another use is to load in a syndication feed in RSS2 or Atom, read each of the entries in the feed, extract the links to the resources then load the resources into a local directory. Yet another use is as a way to process a message queue, with each message in turn being sent off to a web API for additional processing elsewhere. For instance, suppose that you had an XML feed that looked like the following:

                   Resource 1          resource1.xml                    Resource 2          resource2.xml                    Resource 3          resource3.xml     

    The following transformation takes the feed, retrieves the entry resources and stores them on the local file system:

                                                                                                                              

    The element within the "feed" match template illustrates the principle. The @href attribute points to a local file with the same name as the external resource. The @format attribute references a named element (also new in 2.0), making it possible for the result-document output to be formatted in a different manner than is used for output by the main XSLT. The content in this case is the document retrieved from each entry, though it could be more complex code (or xsl:apply-templates elements referencing other templates).It should be noted that the output of this result document is always an empty sequence with respect to the main document - the result of the document in this case is the element in the last line of the feed template.

    There are a few limitations that you have to be careful with for this particular element -- once you start generating output elements in the main thread, you can't include an element until the primary output stream terminates. This usually means that using XSLTs in this manner you often generate relevant code content that may need to be used in more places within variables, then pass the variables into both the subordinate output documents then into the primary output document.

    devx-admin

    devx-admin

    Share the Post:
    USA Companies

    Top Software Development Companies in USA

    Navigating the tech landscape to find the right partner is crucial yet challenging. This article offers a comparative glimpse into the top software development companies

    Software Development

    Top Software Development Companies

    Looking for the best in software development? Our list of Top Software Development Companies is your gateway to finding the right tech partner. Dive in

    India Web Development

    Top Web Development Companies in India

    In the digital race, the right web development partner is your winning edge. Dive into our curated list of top web development companies in India,

    USA Web Development

    Top Web Development Companies in USA

    Looking for the best web development companies in the USA? We’ve got you covered! Check out our top 10 picks to find the right partner

    Clean Energy Adoption

    Inside Michigan’s Clean Energy Revolution

    Democratic state legislators in Michigan continue to discuss and debate clean energy legislation in the hopes of establishing a comprehensive clean energy strategy for the

    Chips Act Revolution

    European Chips Act: What is it?

    In response to the intensifying worldwide technology competition, Europe has unveiled the long-awaited European Chips Act. This daring legislative proposal aims to fortify Europe’s semiconductor

    USA Companies

    Top Software Development Companies in USA

    Navigating the tech landscape to find the right partner is crucial yet challenging. This article offers a comparative glimpse into the top software development companies in the USA. Through a

    Software Development

    Top Software Development Companies

    Looking for the best in software development? Our list of Top Software Development Companies is your gateway to finding the right tech partner. Dive in and explore the leaders in

    India Web Development

    Top Web Development Companies in India

    In the digital race, the right web development partner is your winning edge. Dive into our curated list of top web development companies in India, and kickstart your journey to

    USA Web Development

    Top Web Development Companies in USA

    Looking for the best web development companies in the USA? We’ve got you covered! Check out our top 10 picks to find the right partner for your online project. Your

    Clean Energy Adoption

    Inside Michigan’s Clean Energy Revolution

    Democratic state legislators in Michigan continue to discuss and debate clean energy legislation in the hopes of establishing a comprehensive clean energy strategy for the state. A Senate committee meeting

    Chips Act Revolution

    European Chips Act: What is it?

    In response to the intensifying worldwide technology competition, Europe has unveiled the long-awaited European Chips Act. This daring legislative proposal aims to fortify Europe’s semiconductor supply chain and enhance its

    Revolutionized Low-Code

    You Should Use Low-Code Platforms for Apps

    As the demand for rapid software development increases, low-code platforms have emerged as a popular choice among developers for their ability to build applications with minimal coding. These platforms not

    Cybersecurity Strategy

    Five Powerful Strategies to Bolster Your Cybersecurity

    In today’s increasingly digital landscape, businesses of all sizes must prioritize cyber security measures to defend against potential dangers. Cyber security professionals suggest five simple technological strategies to help companies

    Global Layoffs

    Tech Layoffs Are Getting Worse Globally

    Since the start of 2023, the global technology sector has experienced a significant rise in layoffs, with over 236,000 workers being let go by 1,019 tech firms, as per data

    Huawei Electric Dazzle

    Huawei Dazzles with Electric Vehicles and Wireless Earbuds

    During a prominent unveiling event, Huawei, the Chinese telecommunications powerhouse, kept quiet about its enigmatic new 5G phone and alleged cutting-edge chip development. Instead, Huawei astounded the audience by presenting

    Cybersecurity Banking Revolution

    Digital Banking Needs Cybersecurity

    The banking, financial, and insurance (BFSI) sectors are pioneers in digital transformation, using web applications and application programming interfaces (APIs) to provide seamless services to customers around the world. Rising

    FinTech Leadership

    Terry Clune’s Fintech Empire

    Over the past 30 years, Terry Clune has built a remarkable business empire, with CluneTech at the helm. The CEO and Founder has successfully created eight fintech firms, attracting renowned

    The Role Of AI Within A Web Design Agency?

    In the digital age, the role of Artificial Intelligence (AI) in web design is rapidly evolving, transitioning from a futuristic concept to practical tools used in design, coding, content writing

    Generative AI Revolution

    Is Generative AI the Next Internet?

    The increasing demand for Generative AI models has led to a surge in its adoption across diverse sectors, with healthcare, automotive, and financial services being among the top beneficiaries. These

    Microsoft Laptop

    The New Surface Laptop Studio 2 Is Nuts

    The Surface Laptop Studio 2 is a dynamic and robust all-in-one laptop designed for creators and professionals alike. It features a 14.4″ touchscreen and a cutting-edge design that is over

    5G Innovations

    GPU-Accelerated 5G in Japan

    NTT DOCOMO, a global telecommunications giant, is set to break new ground in the industry as it prepares to launch a GPU-accelerated 5G network in Japan. This innovative approach will

    AI Ethics

    AI Journalism: Balancing Integrity and Innovation

    An op-ed, produced using Microsoft’s Bing Chat AI software, recently appeared in the St. Louis Post-Dispatch, discussing the potential concerns surrounding the employment of artificial intelligence (AI) in journalism. These

    Savings Extravaganza

    Big Deal Days Extravaganza

    The highly awaited Big Deal Days event for October 2023 is nearly here, scheduled for the 10th and 11th. Similar to the previous year, this autumn sale has already created

    Cisco Splunk Deal

    Cisco Splunk Deal Sparks Tech Acquisition Frenzy

    Cisco’s recent massive purchase of Splunk, an AI-powered cybersecurity firm, for $28 billion signals a potential boost in tech deals after a year of subdued mergers and acquisitions in the

    Iran Drone Expansion

    Iran’s Jet-Propelled Drone Reshapes Power Balance

    Iran has recently unveiled a jet-propelled variant of its Shahed series drone, marking a significant advancement in the nation’s drone technology. The new drone is poised to reshape the regional

    Solar Geoengineering

    Did the Overshoot Commission Shoot Down Geoengineering?

    The Overshoot Commission has recently released a comprehensive report that discusses the controversial topic of Solar Geoengineering, also known as Solar Radiation Modification (SRM). The Commission’s primary objective is to

    Remote Learning

    Revolutionizing Remote Learning for Success

    School districts are preparing to reveal a substantial technological upgrade designed to significantly improve remote learning experiences for both educators and students amid the ongoing pandemic. This major investment, which

    Revolutionary SABERS Transforming

    SABERS Batteries Transforming Industries

    Scientists John Connell and Yi Lin from NASA’s Solid-state Architecture Batteries for Enhanced Rechargeability and Safety (SABERS) project are working on experimental solid-state battery packs that could dramatically change the

    Build a Website

    How Much Does It Cost to Build a Website?

    Are you wondering how much it costs to build a website? The approximated cost is based on several factors, including which add-ons and platforms you choose. For example, a self-hosted