Extensible Stylesheet Language Transformations, or XSLT, allows you to transform XML documents into other XML documents using XSL. Using XSLT can be very trying, but it is, in fact, possible to do a great number of things with XSLT templates. However, to do so you have to understand that there is a fairly significant difference between recursive and non-recursive programming.
A programmer friend of mine once pointed out that any recursive problem, one where you call a routine using information generated by a previous iteration of that routine (and typically where the routine calls itself), can in fact be transformed into a non-recursive version of the same routine. I replied to that with the observation that this was like taking a novel, removing the binding, and then spreading the sheets end to end before reading them. You could do it, certainly, but you lose most of the benefits of having it in book form in the first place (at which point he threw the book at me).
XSLT is an uncomfortable language for a lot of people precisely because it almost forces you to think in a recursive manner. However, as an example, you can create a very simple XSLT template that effectively duplicates an entire document with surprisingly little code:
This routine is recursive?it essentially walks the tree of the XML document by moving to a node, copying that node, then applying the same template to any child element, any attribute or any text node. Of course, a stylesheet that returns the same XML as was passed in would be pretty useless, except for the fact that you can use this stylesheet exactly with additional templates to catch exceptions that you don’t want to just pass through.
For example, suppose that you had an XML document that contained a list of employees in your company, as shown in Listing 1. You can use a recursive approach to filter the XML so that only those employees who started in 1998 are listed. To do that, you could adapt the identity.xsl transform given above by creating a filter that will look at the date_started element of a given employee and compare the first four characters against ‘1998’. If this condition isn’t true, the node is simply not passed.
Think of the second template as being a short circuit that filters out information. In a similar manner, you can create another stylesheet that works off this one that will provide a cost of living increasing of 10% of salary for those employees who started in 1998.
While this would be a fairly simple script to write in procedural form, the approach would be different. There, the likely avenue of exploration would be to iterate through all the employees who match the date criteria, changing the salary when such a match is made, then copying the resulting node to an output node set. Ironically, to do this in XSLT requires a little more forethought, though not a lot:
XSLT templates are evaluated (all other things being equal) in reverse order from the way they are presented. Thus, both 1998 employees and other employees will match the first template, but only 1998 employees will match the second, and once a template is matched no other templates are applied. Now, it is likely that when you get employees by year, you will need to be able to work with more than just 1998. You can generalize the content by using parameters, which can be set more easily regardless of the XSLT processor that you’re using than attempting to set a date within a starts-with() function. This is shown in getEmployeesByDate.xsl:
This actually turns this script into a fairly high-powered function?it will return an XML document that gives only employees for a specific year. Additionally, for these employees, you could add or remove elements to the employee record.
Again, as an example, you may want to assign add a tag called
In this case, the current contents of each employee’s record is duplicated, then the review tag is added, which contains the conditional test described earlier. The
In this case, the context?the node that is currently being acted on?changes from that of the current template to each of the nodes chosen in the select statement in turn, so long as that ‘for-each’ is in effect. This is one way of “flattening the hierarchy” and making the XSLT code less recursive. Thus, the code that handles the review of employees could be changed into a for-each paradigm:
This approach can make code that is more familiar to procedural programmer, but it often comes at the expense of flexibility in the code. To update the salary, for instance, you would actually need to set up an exception in the statement
‘For-each’ can also be a little misleading. Because it’s similar to the ‘for’ statement found in most procedural languages, ‘for-each’ can make people think that you can use it to create an indexed iterator?a value that increases (or decreases) by a specific amount after each iteration. A classic example of this is the FOR statement in Basic:
for i=0 to 255 step 1 print i,chr(i)next i
Unfortunately, such a construct doesn’t exist in XSLT. Part of this has to do with the fact that XSLT is a declarative language?the same variable cannot be declared with different values in the same scope. Another reason for this has to do with the fact that the classical template expects a match based upon an XPath context, which doesn’t work well in an indexed iterator model.
Suppose, however, that you could invoke a template by a specific name, rather than by a match condition. This actually has implications outside of just creating an iterator?there are any number of situations where the ability to create output based upon something other than context matching crops up.
For example, consider tables. A general routine to generate a table from a generic XML structure is a very common requirement, yet one of the problems that accompanies the creation of such tables is building column headers for the tables. You need to know the names of the columns ahead of time (making it difficult to generalize the routines). Or you need to content yourself with using XML element names for labels, which often means having odd cases and underscore characters in your output, a condition that is generally not acceptable either. However, a named template could simplify your code?by passing the element name as a parameter to such a template, the routine could convert the element name into a suitably formatted string.
Named templates are in fact a part of the XSLT specification, though sometimes they are not used as often as they should be. In a named template, the match attribute in a template is replaced with a name attribute instead, which contains the name of the template to be called. Named templates are parameterized?you can actually define one or more parameters for a named template in a similar manner to the way that you define parameters for a stylesheet.
A fairly simple case converter might make all lower case characters upper case and replace underscore characters with spaces. This template, “to_upper_case,” accepts one parameter, the expression to be converted, and returns the string with all characters converted to upper case:
The translate method here looks at each character in the passed expression, and if it finds the character in the second string (in this case, in the lower case characters), it matches it to the character in the third string (in this case, the upper case characters). Similarly, an underscore character is converted into a space, in the above expression.
This named template is invoked using the
In this case the conversion is fairly simple, but you can also create more generalized routines that are considerably more complex. The change_case named template, as an example, can be used to convert any string entity into an uppercased entity (“AbcDef” becomes “abcdef”), a lowercased entity (“AbcDef becomes “abcdef”), or a mixed case entity. This last format relies on the use of a delimiter, such as an underscore character, to indicate which characters are converted into upper case. Thus, for the expression “line_item_number” (which may be the name of a given element in an invoice XML document), the mixed case result is “Line Item Number”. The change_case named template is shown in Listing 2.
Such a template can seem a little overwhelming, but it is also quite powerful. Instead of one parameter, change_case actually has five parameters:expr, to_case,replace_term, delimiter and result. The parameter to_case can take any one of “upper”, “lower”, “mixed”, or “none” (the last of which just returns the string as originally presented). The character or characters in replace_term (defaulting to the underscore character) are replaced by the contents of delimiter (which defaults to the space character).
The final parameter, “result,” illustrates the recursive nature of XSLT again. XSLT doesn’t have an intrinsic “replace” function, but it does have the means to split a string into two strings before or after a given delimiter. The change_case template uses this to walk from replace term to replace term in a string, accumulating the contents of one operation. If the routine hasn’t reached the end of the string, the change_case template calls itself, passing the reduced string as the new expression parameter and the processed result as the result parameter. Thus, “result” is actually used to maintain state within the change_case template.
Towards an XSLT Framework
There is no reason why either the replace term or the delimiter needs to be a single character, by the way. You could use exactly the same routine to replace a string in a text node with another string. Recognizing this, the same stylesheet defines a second function called “replace” that, in turn, calls change_case. The primary purpose for doing this is to create an alias that makes more sense for the replace operation. Thus,
will replace “You should use DOM to handle all of your XML programming needs” with “You should use XSLT to handle all of your XML programming needs”.
Note that the stylesheet contains no matches for the root node. These named templates are essentially designed to be called from other templates, so would more likely be imported and used indirectly. For example, the stylesheet in Listing 3 will create a table showing the employee columns, using the change_case routine as an imported template.
Named templates can prove addictive, by the way. For example, the table generation routines are fairly generic, and could themselves be adapted to create generic tables with mixed case headers. The create_table.xsl file located in Listing 4, contains a couple of templates that will turn any collection of records (such as the employees in the sample) into a table.
The create_table stylesheet takes a node set of records, creates the header for it, then iterates through each record matched to the “*” XPath expression. Note the use of the mode attribute here. Imported matching routines normally override the ones in the calling document, so there has to be a way to make sure that the “*” template will be matched only when called from change_case. The mode attribute lets you do just that?by placing a mode attribute with a given label on an apply-templates element, only those templates that have that same mode label will be matched.
This is a powerful concept, because it means that you can abstract many of the operations that you normally do in XSLT in named templates, then just import the templates that you actually need. For example, the following XSLT will now convert the employees XML structure into a table with mixed case headers.:
See Table 1.
One of the problems with such generic filters is the ability to customize content. While you can go back to specifying all of the elements, you can also take a middle ground by creating a way to extend your model. In the recordTable.xsl file, for instance, the record_match mode template invokes a second mode, record_text_match, which defaults to just outputting the text of the current element. Since the imported template is considered to be part of the same scope as the invoking template, you can create a template that specifically matches the exception you’re trying to catch (here, it’s salary) and handle it as a special case.
The upshot of all of this is that you can create much simpler XSLT templates that can still be customized for your needs. As one of the common complaints about XSLT is its complexity, this means that your XSLT programmer can concentrate on designing specialized components, making the process of building XML oriented sites just that much easier.