devxlogo

XmlTransform—A General-Purpose XSLT Pre-Processor

XmlTransform—A General-Purpose XSLT Pre-Processor

he general-purpose XML transformer and/or validator discussed here, named “XmlTransform” operates on an arbitrarily deep directory tree containing files you want to transform. As output it optionally generates multi-level indices and can even add navigational linkages.

XmlTransform’s validation capability is reasonably straightforward; it lets you ensure that the set of XML files used for a transformation are valid according to specified XML schema. You may elect to validate input files, output files (after transformation), or both.

The program’s transformation capability is more interesting. One common application of a transformation engine is as a pre-processor, a very handy thing indeed when designing web pages.

Pre-Processors and HTML
Modern, well-formed HTML is a flavor of XML called “XHTML,” which has subtle but important differences from plain HTML. XmlTransform can transform XML files to web pages quite effectively. In fact, it can generate any arbitrary XML output that you tell it to, but because the process is easier to visualize with a concrete file type such as HTML that you probably already know, the discussion here focuses on HTML generation.

So what is a pre-processor and why is it useful? Maintenance is the primary resource consumer on any software project—often requiring more resources than the original design and implementation of the project. A pre-processor can help reduce maintenance costs. For example, assume that you maintain a corporate web site that displays a logo of a certain size on every one of its 509 plain-vanilla HTML pages. If the corporation suddenly decides to get a new look—new letterhead, new typefaces, and oh, yes, a new, differently sized logo, you would need to edit each of the 500+ pages to display the new logo image. While the amount of work involved in such a change might vary, depending on how the pages are set up, the point is that changes do occur that cause frequently occurring web page elements page to need periodic updating.

That’s where pre-processors come in. Just like a style sheet lets you define something once and reuse it over and over, a pre-processor does the same thing for anything in your web pages, not just styles. You could, for example, create your own XML

element, and then generate the pages, replacing the
element marker with this XHTML code (including the comments):
      
XYZ logo
XYZ Corporation -- © 2008 All rights reserved

The replacement process is the key. To render an instance of a

element in your source file to the above expanded code fragment in your target XHTML file, you must transform the source, specifying precisely how to render it in XSLT. Here’s the XSLT code that would output the HTML shown above:
   

Though you could get some mileage out of the XSLT file included with the downloadable sample project, familiarity with XSLT is crucial to use XmlTransform effectively. The bulk of your efforts revolve around creating the XSLT code that drives the transformations. Designing clean, useful XSLT does take some effort, but it can be quite useful in the long run.

Setting up XmlTransform involves defining a custom XML dialect for your source files that encapsulates all the common elements you use, and then creating an XSLT mapping file that specifies how your particular dialect maps to XHTML (or whatever target XML dialect you wish). The initial setup takes some time, but after you’ve set it up, changing a logo (or a copyright notice, or any other common element throughout your site) is a simple matter of making a single change in the XSLT file, and then regenerating your output files.

 
Figure 1. The XmlTransform Engine At Work: The primary XSLT operation is shown in the center, but you could also invoke schema validation on source or destination trees; the various engine options give you substantial control over how your tree is manipulated.

Figure 1 illustrates how XmlTransform operates. The diagram packs a lot of information, but considering it piece by piece will make it more digestible.

The goal is to map a source tree (upper left) to a target tree (lower right). Look for the three yellow boxes, labeled “Schema validator” and “XSLT engine.” Those are the active components controlled by Boolean command-line switch options (denoted by the valve switches in the figure) to turn the various paths on or off. Specifically, these active component options are: xslTransform, validateInputToSchema, and validateOutputToSchema. Figure 1 shows all options with a dotted outline. Some of the other options shown, such as inExtension and outExtension let you specify different file extensions as your source and target file extensions, respectively.

For example to transform xml to HTML, you’d use .xml as the inExtension and.html as the outExtension. Finally, you need to be able to provide a source tree location (sourcePath) and a list of directories that should be transformed (dirList) and indicate where to place the output files (targetPath). In Figure 1, the items included in dirList are indicated in the figure with red rectangles in the source tree. The non-selected directories are shown crossed out in the target tree, since those were not included. There are a number of other parameters that you need to consider, though most default to some reasonable value.

XmlTransform provides two other, powerful benefits to simplify maintenance: generating summary or contents pages and adding navigational connections. These are discussed in the next two sections.

No File Is an Island
Figure 2 illustrates XmlTransform’s capability to automatically populate a contents page for a web section using elements from each sibling or child web page. The figure shows how XmlTransform can extract a portion of the element and a portion of a specific element from each page in a relevant set of pages and combine those into a single contents page. That way, if you add or remove pages from the set, you need only rerun XmlTransform to update the generated contents page.

 
Figure 2. Automatic Contents Generation: XmlTransform can automatically generate contents pages (shown at the bottom) from selected details it extracts from source files.

If you look at Figure 1 you can see that XmlTransform not only generates files in your target tree but it also generates a few intermediate “contents” files in your source tree (highlighted in magenta in the figure). You create a template for a contents file, which must be named _index.xml and stored in the directory containing the files it will summarize, and XmlTransform populates that template with the summary information. For example, the _index.xml file shown below was used to generate the output in Figure 2:

                  CleanCode::Web Guidelines       $Id: _index.xml 20 2006-12-29 00:03:27Z dellxp $       $Revision: 20 $                        

Web Guidelines

Design Considerations for Web Sites


Unlike other files in each directory, this _index.xml template file must undergo an extra processing step before being transformed along with the rest of the files in your source tree. This extra step fleshes out the template with the files under its jurisdiction; you control the operation with the element in the template code above. XmlTransform fills this template out and writes it to an intermediate file named web info.xml in the source tree as shown in Figure 1, before performing the rest of the transformations. Here’s the relevant portion of this intermediate file (web info.xml); if you compare it to the template shown above, you’ll see how XmlTransform has filled out the element with information about each of the other files in the directory:

                  CleanCode::Web Guidelines       $Id: _index.xml 20 2006-12-29 00:03:27Z dellxp $       $Revision: 20 $                         

Web Guidelines

Design Considerations for Web Sites

webRules/accessibility.html /usr/doc/webRules/accessibility.xml webRules/antispam.html /usr/doc/webRules/antispam.xml webRules/browser.html /usr/doc/webRules/browser.xml webRules/cssConformance.html /usr/doc/webRules/cssConformance.xml . . .

The contents page template may include any other HTML that you wish. Only the marker element () is significant to XmlTransform, letting it know where to insert the generated contents. The preceding example shows a very basic page that adds only

and

header elements. The intermediate file is now ready to be processed—along with all the referenced pages—by the XSLT transformations specified in your mapping file. Here’s the reference code from the supplied translate.xsl that turns the intermediate-file XML shown above into XHTML:
         

If you study the components of this short XSLT fragment and compare it to the output shown in Figure 2, you’ll see how it targets only the relevant information. (The trim function invoked in the preceding XSLT code, but not shown, returns the final octet of a colon-separated string; it has no particular significance other than it is the convention I chose to work with in my coding style.) For completeness, here is the relevant fragment of the final XHTML page rendered by the XSLT transformation:

      . . .   

Web Guidelines

Design Considerations for Web Sites

. . .

The example above generates an ungrouped contents file; in other words, it places the items into a single list on the contents page, but you may refine this by creating an arbitrary set of smaller, named groups. You could, for example, have an introductory paragraph, and then some generated contents entries, another introductory paragraph, some more contents entries, and so forth. Furthermore, you can specify a contents template for each directory, so you may tailor them as your content dictates.

Author’s Note: As a further example this web page example shows a contents page with 19 files arranged in six sections.

The contents or summary page shown in this section provides a mechanism for drilling or navigating down one hierarchy level. The next section completes the picture by demonstrating how to navigate up as well as side to side, which also raises the interesting question of where to place the generated contents file.

Connecting the Dots
Assume, for example, that you have a series of web pages that have a natural order, perhaps a set of pages that make up a tutorial. Each web page has navigational buttons to advance to the next page, return to the previous page, go up to the contents page, and so forth. When you want to add a new page, remove a page, or re-order the material, using static pages, you would typically have to manually edit all the linkages, a time-consuming and error-prone process. Figure 3 shows a representation of four such related pages. Each page has buttons that navigate to the first and last pages, the previous and next pages, and the parent page. By using an appropriate XSLT transformation mapping, XmlTransform can generate such navigational connections automatically.

 
Figure 3. Connecting Neighboring Pages: XmlTransform generates a web info.html contents parent file, reachable via the middle (“go-up”) navigational button on each page. The other buttons provide side-to-side and end-to-end navigational control for the child pages.

If you look closely at Figure 3, you’ll notice that every page displays the go-to-first-page and go-to-last-page buttons, but the three buttons in the middle—previous, parent, and next—are conditional. For example, you don’t want the previous-page button to display on the first page or the next-page button to appear on the last page. Also, in some cases, two buttons may point to the same page. For example, on page two the go-to-first-page button and the previous-page button both point to page one.

The parent button is quite useful, particularly if you have a deep directory structure. Consider the automatic contents generation discussed in the previous section. A widely used convention is that the index.html file for a directory resides within that same directory. Using that convention, the generated navigation buttons treat index.html as just another page to be reached with the previous and next buttons, and not by the parent button. In order to have the contents as a “proper” parent, it needs to be physically located one directory above its children. What if you have several directories that you want to generate contents for? You obviously cannot use multiple different files all named index.html. The solution used by XmlTransform is to instead name the contents file the same as the directory. Thus a directory called stuff spawns a contents file called stuff.html.

Given the conflicting requirements of these two approaches, XmlTransform allows you to generate the contents page either in the same directory as index.html if you prefer that widely used convention, or in the directory above as

.html if you want to use the more natural hierarchical model and take advantage of the parent navigation button.

Listing 1 shows the fragment of XSLT code used to generate the navigational buttons on each web page. This XSLT fragment—along with a lot more code to display a search box, a logo, menus, etc.—is incorporated into each web page automatically when you use the element rather than just the regular element. Here’s a portion of the XHTML output for the navigational controls from one page. Careful observation reveals that this code is from the first page in Figure 3 (access.html) because the previous-page link shows a spacer instead of an arrow and contains no active link.

   NAVIGATION:                     first page in section                                        up one level                            next page                            final page in section          

At this point, you’ve seen the basics of how XmlTransform works, how to create contents pages, and how to automate navigation between files. Those latter two features are very useful when transforming XML to HTML, but you may not need them to target other XML dialects.

Running XmlTransform
To run XmlTransform, you need to load a few components:

  • Load Java (version 1.5 or later). You need only the Java run-time engine, which you probably already have. If not, download it from Sun.
  • Load the Xerces library for XML parsing.
  • Load the Xalan library for XSLT transformations.
  • Load the cleancode-java open source library.
  • Add the following JAR files to your Java classpath: the CleanCode library (cleancode.jar), the Xerces library (xml-apis.jar and xercesImpl.jar), and the Xalan library (serializer.jar and xalan.jar).

To test the installation, invoke the usage message option:

   > java com.cleancode.xml.XmlTransform --help

That command displays the complete list of command-line options available, with a one-line description of each.

Because there are a large number of options, XmlTransform allows you to put all your options in a parameter file and reference that file from the command line by preceding the file name with an “at sign” (@) prefix, as in:

   > java com.cleancode.xml.XmlTransform @myParams.dat

XmlTransform treats options in a parameter file just as if you had typed them on the command line, except that they are not exposed to shell interpolation. This can avoid conflicts with special characters that your shell may want to process first (quotes, redirection, etc.) if they are provided directly on the command line. A second benefit to a parameter file is that you may use it to create a default set of options and then override or augment that set on the command line with either direct options or with another parameter file.

As an example, suppose you have a parameter file called inline.conf containing these three parameters (for simplicity the example options below are not real option names):

   --x1=2   --x2=4.3   --x3=true

You could then specify options on the command line as:

   > java class_name --x1=4 @inline.conf --x4=0.5

The parameter file is interpolated just as if you had written:

   > java class_name --x1=4 --x1=2 --x2=4.3 --x3=true --x4=0.5

You can see the x1 option is effectively given twice; the last one encountered is the one that gets used, so the value of x1 will be two rather than four. An important point to note, then, is that if you wish to use a default set of options from a file and override them on the command line as needed, the parameter file specification must precede all the direct options on the line (unlike in the contrived example above where it is in the middle).

Yet another variation of this is simply to create a default set of options in one parameter file and a few customized options in another.

   > java class_name @default.dat @variation-one.dat

Option Summary
Below is the list of all the available options for XmlTransform. The following steps describe the details of these XmlTransform options in logical groupings. Also see the XmlTransform API for further details.

     contentsBaseName       -- base name for contents file (e.g. 'index')     contentsToParent       -- boolean indicating to place contents                               in parent or same dir     debug                  -- enable all diags if true (including                                libraries)     diagList               -- string of single-character diags to activate     dirList                -- comma-separated list of dirs to process     enable                 -- do processing if true; just report if false     generateContents       -- boolean switch to generate contents     generatorNode          -- tag name of node to replace with                                generator info     groupIdXpath           -- simple Xpath pointing to group id                                node in each file     groupPlaceHolder       -- tag name in contents file to put file list     help                   -- show this list     inExtension            -- extensions of files to process     inputSchemaSource      -- global Schema file for input validation     outExtension           -- extensions for translated files     outputSchemaSource     -- global Schema file for output validation     processAll             -- process without checking date stamps if true     sourcePath             -- root of source XML tree     startDepth             -- number of directories from the top of                                your tree back to your own relative root     targetPath             -- root of target XML tree     validateInputToSchema  -- boolean switch to validate input     validateOutputToSchema -- boolean switch to validate output     validateXslBySchema    -- boolean switch to validate needed XSL files     xslName                -- primary XSL file in each directory     xslParmList            -- comma-separated list of XSL parameters     xslSchema              -- name of Schema file for XSL files     xslTransform           -- boolean switch to do XSL translation

These remaining options are available in XmlTransform but are not specific to this application. Any application that uses the CleanCode diagnostic system would have these same options. Complete details of the Diagnostic API are available here.

     .*_DIAG                -- diagnostic level for any class     CREATE_DIAG            -- diagnostic level for object creations     DIAG_LEVEL             -- diagnostic mask     ENV_DIAG               -- diagnostic level for system environment     FORMATTER              -- class name for web channel formatting     LOG_DIAG_NAME          -- base file name for regular messages     LOG_DIR                -- directory in which to create log files     LOG_ERR_NAME           -- base file name for warning/error messages     OUTPUT_DIAG            -- output channels for regular messages     OUTPUT_ERR             -- output channels for warning/error messages     SHOW_THREAD            -- show/hide switch for thread information     TRACE_DIAG             -- diagnostic level for method enter/exit     TRACE_INDENT           -- indentation string for nested output     VERSION_DIAG           -- diagnostic level for module versioning     WARNINGS_ON            -- show/hide switch for warning messages

Listing 2 illustrates a complete parameter file from a sample project. The steps on the next page explain in further detail how to determine the appropriate values.

Step 1—What To Do
XmlTransform supports several Boolean switches to enable or disable the supported functions:

  • xslTransform specifies to transform input to output using your XSLT specification (as opposed to simply validating a set of files).
  • generateContents specifies to generate contents files in the input tree (using contents template files as discussed earlier).
  • validateInputToSchema specifies to validate the input (using your XML Schema definition).
  • validateOutputToSchema specifies to validate the output (using your XML Schema definition).
  • validateXslBySchema is intended for advanced uses. It causes XmlTransform to validate the XSL file itself (not commonly needed because any errors will be evident when xslTransform is turned on).

Step 2—Where To Do It
You specify where your input tree resides (sourcePath) and where your output tree should be generated (targetPath). Both default to the current directory if not specified. Next, you specify the input file extension (inExtension) and the output file extension (outExtension). These default to xml and html, respectively, if not specified.

Step 3—What To Do It With
To transform each file, you must provide an XSL file that specifies the transform information (xslName). If you specify an absolute path, XmlTransform will use that file at any subdirectory depth. If, however, you specify just a base file name (such as stuff.xsl), then XmlTransform will look for a file with that name within each processed subdirectory. If it doesn’t find the file there, it looks for the file of that name in your root directory (sourcePath). This provides flexibility,because you can specify one global XSL specification but override it in specific instances as required. Alternatively, if you don’t provide a root XSL file, then XmlTransform processes only those subdirectories that contain a local XSL file.

To generate a table of contents for any given subdirectory, you must provide an XML file template—this is just an XML file like any other among your files, with the exception that it has one or more placeholders for referencing lists of other files, as discussed earlier in the article. You must name the template file according to the following convention: an underscore, then the contentsBaseName parameter, a dot, and the inExtension parameter (for example, _myDir.xml). XmlTransform fills in the template and stores it in your input tree as an intermediate file. The intermediate file name depends on whether you elect to store the contents file in the same directory as the contents or in the parent (contentsToParent); if in the same directory, the name will be the same base name-dot-inExtension, less the underscore. You further need to specify the group placeholder (groupPlaceHolder) indicating where in your contents template to insert content items.

Because XmlTransform writes intermediate files, in the source tree, it needs to be sure that it’s not inadvertently overwriting one of your content files. On the other hand, it can’t just check for the existence of the file, because XmlTransform may itself have created such a file during an earlier run. Therefore, it writes a generator identification string to a specified node in each generated intermediate file; the presence of that node indicates that the file can be overwritten. You specify what element in your XML should receive this generator identification string via the generatorNode option. The final option needed for contents file generation is groupIdXpath, which specifies an XPath expression used to find the group identifier in each file.

Step 4—How To Do It
Finally, you need to provide a few details on how the program should operate.

  • Which Subdirectories To Process— Using the dirList option, you must explicitly specify which subdirectories under sourcePath XmlTransform should process. This string should be a comma- or semicolon-separated list of subdirectory names (relative to sourcePath) such as sub1, sub1/subsub1, sub2, sub3. If omitted, XmlTransform processes only sourcePath itself (with no subdirectories).
  • Preview Mode—Until you are comfortable with the program, or if you want to check new configuration options you have made, you may see what the program would do without actually doing it. You control this with the enable flag. If omitted, the default is true. When you set enable to false, XmlTransform does no actual work—but it does report what it would do.
  • Stingy Mode—In the spirit of economy, XmlTransform does only what is necessary. That is, it keeps track of what it has done on previous invocations, and only validates or transforms files that have changed. You may override this and force it to process all files using the processAll flag. If omitted, the default is false.
  • Tracking Subdirectory Depth—This option is intended for advanced use. XmlTransform keeps track of the subdirectory depth it’s processing, allowing you to define location-relative actions and paths. If for example, you are creating HTML as output, and you want to specify a relative path to an included file, you could use the depth to correctly generate a prefix such as “../../..” to prepend to a file name. Here’s a simple XSL template that can generate the appropriate path:
    

The startDepth configuration option (default=0) allows you to specify an offset between your root (sourcePath) and the location of any referenced include files. For example, if your include files are in the directory above your HTML files, you could specify a startDepth of 1, which would direct the above XSL routine to add an extra “..” in the path. Note that XmlTransform passes the depth of the subdirectory it is processing to the XSL transformation, as an offset to this starting depth using the parameter name level. Therefore, you would use a call-template element in XSL, passing the $level parameter as its argument to create your path string in the preceding example. This is just one example; the $level parameter has other uses as well. For example, you can invoke different templates within your XSL depending on your current level.

  • Providing Custom XSL Parameters—This option is intended for advanced use. Just as the current subdirectory level is passed in to your XSL as a parameter, you may also provide user-defined values to pass using the xslParmList configuration option. This string should be a comma-separated or semicolon-separated list of parameter settings. Each parameter setting must have the form name:value, and the values may contain neither commas nor semicolons. This is useful for passing in such values as a copyright date or a release version number, for example: xslParmList=copyright:2006,relVersion:v1.2. In addition, XmlTransform makes the generator identification string (discussed earlier in the context of contents files) available automatically via this mechanism. To access the parameter, you first include this line in your XSL:
  •    

    Then to use it, you might use something like this, if you are creating HTML or XHTML:

       

    Overall, because of the number of options and their effects on the output, XmlTransform does have a fairly steep learning curve, but if you have a problem to tackle that it can handle, it can be quite a time saver. In the real world, XmlTransform originally served to generate static pages on my open source web site. Rather than write in HTML, I can write pages in a shorthand custom XML dialect and let XmlTransform automatically take care of the fancy headers, footers, page linkages, copyright date, and so forth. But XmlTransform is useful in other situations as well. For example, it can act as a SQL documentation generator akin to Ndoc (for C#) or JavaDoc (for Java). The article “Add Custom XML Documentation Capability To Your SQL Code” provides a detailed explanation of how to use XmlTransform to accomplish SQL documentation. That article shares the same set of sample source files that you can find attached to this article, and you can use those to experiment with other uses for XmlTransform.

    devxblackblue

    About Our Editorial Process

    At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

    See our full editorial policy.

    About Our Journalist