“Document Your Code! Document Your Code!!”
This is one of those lessons hammered into most Computer Science majors about three days into their first code class–and one quickly forgotten by most within the first eight days. The benefits of documenting code are obvious-from being able to remember what you worked on several months before to making sure that someone who inherits your code can have a basic clue about what you were trying to do.
Problem Summary: Documentation is difficult to write, and even more difficult to maintain. Wouldn’t it be convenient to have self-documenting code to which you could add custom documentation–and have all of that display in a readable manner, on demand?
Solution Summary: In XSLT, you can take advantage of extensibility, a known document structure, and transforms to produce self-documenting XSLT templates.
Documentation, however, can be time-consuming to write, especially because there are really two types of documentation. User documentation describes the methods and parameters of an object, or the description of how a given class works. Diagnostic documentation, on the other hand, is intended to either explain why a given routine or variable is used or why something is commented out-its purpose serves as working notes for a piece of code, something that would be useful for debugging an application if it doesn’t work properly (or upgrading it if it does) but which may not be appropriate for the final consumer of the product. User documentation for a VCR would tell you which buttons to press to make it work (and how to prevent the blasted clock from flashing “12:00” all the time), while diagnostic documentation would tell a technician how to replace a malfunctioning module.
Because the difference between user and diagnostic documentation isn’t always immediately obvious, developing documentation (whether using XSLT or any other programming language) often tends to be a somewhat haphazard process. XSLT has the advantage of a known document structure that makes it easy to both determine functionality and retrieve information without necessarily needing to write any explicit documentation. By augmenting this with a distinct documentation “namespace”, you can actually create fairly extensive documentation for all your XSLT routines.
The JavaDoc program included with most Java implementations is a model. JavaDoc examines the overall class structure and retrieves primary class, method, property and event names exposed by the class, as well as other relevant structural information, such as parameter names and data types. JavaDoc then maps the information into an HTML file (something which should be ringing bells in the head of any XSLT developer), and indicates where the class fits in the overall framework’s architecture. However, by following a specific format for comments in the source code, it is possible to create very rich sets of documentation that go beyond describing the core structure, and get into detailed explanations about how to use the class in practice.
XSLT Documentation Technique
A similar technique can easily be applied to XSLT documents, first by analyzing the document’s general structure and then by using a doc: namespace extension to annotate the transformation. Interestingly enough, the process to generate this document is itself an XSLT transformation–an example of a style sheet being applied to another style sheet.
In looking at such style sheets, it’s worth examining the basic structures that would be useful in documentation. Table 1 lists the useful style sheet pattern structures
These elements make up the bulk of the “skeleton” upon which an XSLT document is based, and as such serve as the analogs to methods and properties that you expect to see in more traditional programming languages. Note that this set of tags differentiates between matched and named templates–for example, those that have and in them. This distinction is important, especially in situations where you’re dealing with imported or included files, since such files together often can form a rudimentary framework of XSLT objects.
However, by themselves, these elements will provide only a rudimentary form of documentation, much like JavaDoc working on a generic class will provide the interrelationships, but they will be unable to describe the purpose of functions. Documentation in this regard is meta-information-it describes the code. Unfortunately, machines are notoriously bad at determining intent they can tell you what things go where, but the phrase “why?” is considerably harder to code.
Thus the second aspect of documenting XSLT is to provide some kind of additional annotation to those elements that you want to be publicly exposed. In this article, I recommend creating a distinct namespace (here I’m using the prefix doc: but of course that can be changed if you already have such a prefix collision). The namespace then contains elements and attributes that can both describe the “why” of a given XSLT document, but that can also be used to determine what specifically should be exposed. Table 2 lists the various doc: elements and attributes that I’ve used in this specific implementation of a documentation class.
As a simple example, the following XML code provides a document summary node that could be picked up by the documentation.xsl stylesheet:
xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” xmlns:msxml=”urn:schemas-microsoft-com:xslt” xmlns:doc=”urn:schemas-cagle-com:document” exclude-result-prefixes=”msxml doc” version=”1.0″> Text To Node Conversion textToNodes.xsl 1.0 2000-12-01 2001-01-02 the Text To Node Conversion converts a delimited text file passed into the parameter parse_text parameter as a string into a series of XML records.
The stylesheet first needs to define the doc namespace by including the msxml:doc attribute on the stylesheet. To ensure that the doc namespace (along with the msxml namespace, used for some support functions) isn’t also sent along to the output stream, add the exclude-result-prefixes attribute, which takes a list of all of the prefixes that you don’t want to have end up in the final output.
The doc:summary block serves as the nexus for providing documentation information about the stylesheet. Strictly speaking, the doc:summary block is not necessary?the documentation.xsl function can provide a fairly exhaustive listing without the need to include the doc:summary header block?but it serves as a place to provide detailed user information, such as a description of what the stylesheet does, the filename of the stylesheet (which can’t be retrieved explicitly from XSLT without using some external functions, so should generally be included internally), a title which clearly identifies the stylesheet, date information, and finally, a version number.
The version number is actually quite useful. One common problem that I encounter when working with XSLT is that I may have multiple versions of the same stylesheet. I can’t necessarily rely on date alone to insure that I’m dealing with the most recent version. By incorporating a version stamp for the stylesheet (as opposed to the version of the XSLT, which is what the stylesheet xsl:version attribute is for) you can track revisions and provide a consistent identifier that more complex document management systems can use to keep your code up to date.
As an aside, the doc:description element can contain either XHTML text or CDATA sections containing HTML. The doc:summary/doc:description element is the primary description for the document, and is intended to give the reader a clear picture of what the template does. When the description is present, it is possible to get syntactical information, but semantic meaning will be more difficult to ascertain. As such, you should use this element extensively as the primary user documentation into your code.
The doc:description element can also appear as an attribute to templates, parameters, variables and attribute sets. These work in conjunction with the doc:public attribute, which defines whether or not a given element should be visible to the documentation. In general, doc:public defaults to “yes” for named templates (templates with the name attribute), global parameters, and the parameters of named templates, and defaults to “no” for matched templates (templates with the match attribute), global variables, and internal variables. Additionally, even if a local parameter is made public, if its containing template is private (i.e., doc:public=”no”) then it won’t be displayed.
For example, the following named template, convert_text_to_xml, is explicitly made public, which makes its parameters public implicitly. Note that the variable lines.tf is implicitly private:
Taken together the summary and the public descriptions can provide a detailed user interface. However, as I mentioned previously, the structure of the XSLT document itself can also provide a great deal of information about what it is supposed to do. One of the most useful of these nodes is the
The documentation stylesheet queries the
- Indent: No (default)
- Version:1.0 (default)
- XML Declaration Omitted:yes
- Stand Alone: Either
- Encoding: UTF-16
- Is Identity Template:no
- Is Stylesheet Executable:yes
The last two elements need to be explained in a little more detail. One template pattern that commonly crops up is the identity template. Such a template is also known as a treewalker, since it works by walking over each node and copying it to the output stream unless there is a specific template that overrides the default match. Identity templates are useful for expanding custom tags in XHTML elements and can change the behavior of the code considerably. The item Is Identity Template will be set to yes either if the pattern for an identity exists (e.g., “*|@*|text()” or “@*|node()”) or if a
Is Stylesheet Executable catches the other primary type of stylesheets. If a template matches the root node (i.e., then when it is applied to an XML document it will convert the document. However, without that root match, there is nothing to “start” the process, and the stylesheet won’t work. In some cases, such as libraries of specialized templates, there is no real reason to include a root node template, because these stylesheets are meant to be called only through the
Finally, note that if the document contains more than one
Imports, Includes and ExamplesImports and Includes provide a particular challenge for documentation, because an imported document could include another imported document, making for very deep structures. Handling this proved something of a challenge, but the key came when I realized that the display of a page could be handled by converting a matched root template into a named template, then passing the stylesheet as an argument to this named template. By doing this, you can use the XSLT document() function to read the stylesheet given in the href attribute of a
One consequence of this is that you turn a linear process into a recursive one-a document will pass its own imports and includes into the same named template, and as a consequence the document that draws the initial page will also be used to draw any subordinate pages. Because the initial calling page may lack some filename context (i.e., unless explicitly given in a
If a document contains an imported or included file, initially only the name of the file will be displayed as a header element. If you click on this header, though, you can expand the documentation for the subordinate document (and can likewise expand any subsidiary imports in the first imported document).
Sometimes explicit examples help clarify documentation. The
The Documentation XSLTThe actual code for the documentation.xsl file is fairly complex, consisting of about 10 distinct templates over about 300 lines of code. However, it contains no external parameterizations, so you can apply the documentation.xsl file to any XSL document by including the xml-stylesheet directive (this applies only to IE5 and above). Note that you must have the MSXML3 parser installed-this will not work with the older MSXML2.5 version:
Note that the documentation.xsl file can even be applied to itself, and the file contains a
Stripping Documentation Information
Documented XSLT functions can get to be very large, and in a server environment this overhead can prove a barrier to scalability. Fortunately, the structure of the documentation namespace is such that it is very easy to remove from the stylesheet through another XSLT transformation. This one, stripDocumentation.xsl is another identity template that searches for any node in the doc: namespace and terminates it at that point, but otherwise copies all other nodes found into the output stream.
The result of the documentation stripper stylesheet should not be saved over the original documented stylesheet, unless you want to lose your documentation. Instead, you should think about keeping two versions of your stylesheets, a documented version that you update with the appropriate code and the stripped undocumented version that you deploy in your production version (and that’s created from the documented version).
Part of the process of creating a cohesive framework for XSLT development lies in providing some level of documentation for the transformation stylesheets with which you’re dealing. The techniques and stylesheets given here provide some basic examples of how XSLT can be both documented and deployed in a production environment, although certainly there is a lot more that can be done. The current documentation namespace is largely an ad hoc one. The next logical course would be to model a documentation namespace using the Resource Description Framework (RDF) schema, making it compliant with other XML specifications. Moreover, the documentation should be considered as part of a larger documentation management system that does such things as track revisions and versioning, manage imports and includes, and work in conjunction with XML data sources. Still, even without spending the time to create the full user documentation, the documentation.xsl stylesheet shows how you can easily query an XSLT document for all types of useful information.