The replacement process is the key. To render an instance of a
element in your source file to the above expanded code fragment in your target XHTML file, you must transform the source, specifying precisely how to render it in XSLT. Here’s the XSLT code that would output the HTML shown above:
BEGIN footer XYZ Corporation -- © 2008 All rights reserved
END footer
Though you could get some mileage out of the XSLT file included with the downloadable sample project , familiarity with XSLT is crucial to use XmlTransform effectively. The bulk of your efforts revolve around creating the XSLT code that drives the transformations. Designing clean, useful XSLT does take some effort, but it can be quite useful in the long run.
Setting up XmlTransform involves defining a custom XML dialect for your source files that encapsulates all the common elements you use, and then creating an XSLT mapping file that specifies how your particular dialect maps to XHTML (or whatever target XML dialect you wish). The initial setup takes some time, but after you’ve set it up, changing a logo (or a copyright notice, or any other common element throughout your site) is a simple matter of making a single change in the XSLT file, and then regenerating your output files.
Figure 1 . The XmlTransform Engine At Work: The primary XSLT operation is shown in the center, but you could also invoke schema validation on source or destination trees; the various engine options give you substantial control over how your tree is manipulated.
Figure 1 illustrates how XmlTransform operates. The diagram packs a lot of information, but considering it piece by piece will make it more digestible.
The goal is to map a source tree (upper left) to a target tree (lower right). Look for the three yellow boxes, labeled “Schema validator” and “XSLT engine.” Those are the active components controlled by Boolean command-line switch options (denoted by the valve switches in the figure) to turn the various paths on or off. Specifically, these active component options are: xslTransform , validateInputToSchema , and validateOutputToSchema . Figure 1 shows all options with a dotted outline. Some of the other options shown, such as inExtension and outExtension let you specify different file extensions as your source and target file extensions, respectively.
For example to transform xml to HTML, you’d use .xml as the inExtension and.html as the outExtension . Finally, you need to be able to provide a source tree location (sourcePath ) and a list of directories that should be transformed (dirList ) and indicate where to place the output files (targetPath ). In Figure 1 , the items included in dirList are indicated in the figure with red rectangles in the source tree. The non-selected directories are shown crossed out in the target tree, since those were not included. There are a number of other parameters that you need to consider, though most default to some reasonable value.
XmlTransform provides two other, powerful benefits to simplify maintenance: generating summary or contents pages and adding navigational connections. These are discussed in the next two sections.
No File Is an Island Figure 2 illustrates XmlTransform’s capability to automatically populate a contents page for a web section using elements from each sibling or child web page. The figure shows how XmlTransform can extract a portion of the element and a portion of a specific element from each page in a relevant set of pages and combine those into a single contents page. That way, if you add or remove pages from the set, you need only rerun XmlTransform to update the generated contents page.
Figure 2 . Automatic Contents Generation: XmlTransform can automatically generate contents pages (shown at the bottom) from selected details it extracts from source files.
If you look at Figure 1 you can see that XmlTransform not only generates files in your target tree but it also generates a few intermediate “contents” files in your source tree (highlighted in magenta in the figure). You create a template for a contents file, which must be named _index.xml and stored in the directory containing the files it will summarize, and XmlTransform populates that template with the summary information. For example, the _index.xml file shown below was used to generate the output in Figure 2 :
CleanCode::Web Guidelines $Id: _index.xml 20 2006-12-29 00:03:27Z dellxp $ $Revision: 20 $ Web Guidelines Design Considerations for Web Sites
Unlike other files in each directory, this _index.xml template file must undergo an extra processing step before being transformed along with the rest of the files in your source tree. This extra step fleshes out the template with the files under its jurisdiction; you control the operation with the element in the template code above. XmlTransform fills this template out and writes it to an intermediate file named web info.xml in the source tree as shown in Figure 1 , before performing the rest of the transformations. Here’s the relevant portion of this intermediate file (web info.xml ); if you compare it to the template shown above, you’ll see how XmlTransform has filled out the element with information about each of the other files in the directory:
CleanCode::Web Guidelines $Id: _index.xml 20 2006-12-29 00:03:27Z dellxp $ $Revision: 20 $ Web Guidelines Design Considerations for Web Sites webRules/accessibility.html /usr/doc/webRules/accessibility.xml webRules/antispam.html /usr/doc/webRules/antispam.xml webRules/browser.html /usr/doc/webRules/browser.xml webRules/cssConformance.html /usr/doc/webRules/cssConformance.xml . . .
The contents page template may include any other HTML that you wish. Only the marker element ( ) is significant to XmlTransform, letting it know where to insert the generated contents. The preceding example shows a very basic page that adds only
and
header elements. The intermediate file is now ready to be processed—along with all the referenced pages—by the XSLT transformations specified in your mapping file. Here’s the reference code from the supplied translate.xsl that turns the intermediate-file XML shown above into XHTML:
If you study the components of this short XSLT fragment and compare it to the output shown in Figure 2 , you’ll see how it targets only the relevant information. (The trim function invoked in the preceding XSLT code, but not shown, returns the final octet of a colon-separated string; it has no particular significance other than it is the convention I chose to work with in my coding style.) For completeness, here is the relevant fragment of the final XHTML page rendered by the XSLT transformation:
. . . Web Guidelines Design Considerations for Web Sites Accessibility : Don't discriminate on physical ability when you design web pages.
Anti-Spam : Design defensively so you do not make it easy for spammers to enlist you to help them.
CSS Conformance : Use CSS to improve your design, reduce duplication, and simplify maintenance, but getting it right can be a challenge.
. . .
The example above generates an ungrouped contents file; in other words, it places the items into a single list on the contents page, but you may refine this by creating an arbitrary set of smaller, named groups. You could, for example, have an introductory paragraph, and then some generated contents entries, another introductory paragraph, some more contents entries, and so forth. Furthermore, you can specify a contents template for each directory, so you may tailor them as your content dictates.
Author’s Note: As a further example this web page example shows a contents page with 19 files arranged in six sections.
The contents or summary page shown in this section provides a mechanism for drilling or navigating down one hierarchy level. The next section completes the picture by demonstrating how to navigate up as well as side to side, which also raises the interesting question of where to place the generated contents file.
Connecting the Dots Assume, for example, that you have a series of web pages that have a natural order, perhaps a set of pages that make up a tutorial. Each web page has navigational buttons to advance to the next page, return to the previous page, go up to the contents page, and so forth. When you want to add a new page, remove a page, or re-order the material, using static pages, you would typically have to manually edit all the linkages, a time-consuming and error-prone process. Figure 3 shows a representation of four such related pages. Each page has buttons that navigate to the first and last pages, the previous and next pages, and the parent page. By using an appropriate XSLT transformation mapping, XmlTransform can generate such navigational connections automatically.
Figure 3 . Connecting Neighboring Pages: XmlTransform generates a web info.html contents parent file, reachable via the middle (“go-up”) navigational button on each page. The other buttons provide side-to-side and end-to-end navigational control for the child pages.
If you look closely at Figure 3 , you’ll notice that every page displays the go-to-first-page and go-to-last-page buttons, but the three buttons in the middle—previous, parent, and next—are conditional. For example, you don’t want the previous-page button to display on the first page or the next-page button to appear on the last page. Also, in some cases, two buttons may point to the same page. For example, on page two the go-to-first-page button and the previous-page button both point to page one.
The parent button is quite useful, particularly if you have a deep directory structure. Consider the automatic contents generation discussed in the previous section. A widely used convention is that the index.html file for a directory resides within that same directory. Using that convention, the generated navigation buttons treat index.html as just another page to be reached with the previous and next buttons, and not by the parent button. In order to have the contents as a “proper” parent, it needs to be physically located one directory above its children. What if you have several directories that you want to generate contents for? You obviously cannot use multiple different files all named index.html . The solution used by XmlTransform is to instead name the contents file the same as the directory. Thus a directory called stuff spawns a contents file called stuff.html .
Given the conflicting requirements of these two approaches, XmlTransform allows you to generate the contents page either in the same directory as index.html if you prefer that widely used convention, or in the directory above as .html if you want to use the more natural hierarchical model and take advantage of the parent navigation button.
Listing 1 shows the fragment of XSLT code used to generate the navigational buttons on each web page. This XSLT fragment—along with a lot more code to display a search box, a logo, menus, etc.—is incorporated into each web page automatically when you use the element rather than just the regular element. Here’s a portion of the XHTML output for the navigational controls from one page. Careful observation reveals that this code is from the first page in Figure 3 (access.html ) because the previous-page link shows a spacer instead of an arrow and contains no active link.
At this point, you’ve seen the basics of how XmlTransform works, how to create contents pages, and how to automate navigation between files. Those latter two features are very useful when transforming XML to HTML, but you may not need them to target other XML dialects.
Running XmlTransform To run XmlTransform, you need to load a few components:
Load Java (version 1.5 or later). You need only the Java run-time engine, which you probably already have. If not, download it from Sun .
Load the Xerces library for XML parsing.
Load the Xalan library for XSLT transformations.
Load the cleancode-java open source library.
Add the following JAR files to your Java classpath: the CleanCode library (cleancode.jar ), the Xerces library (xml-apis.jar and xercesImpl.jar ), and the Xalan library (serializer.jar and xalan.jar ).
To test the installation, invoke the usage message option:
> java com.cleancode.xml.XmlTransform --help
That command displays the complete list of command-line options available, with a one-line description of each.
Because there are a large number of options, XmlTransform allows you to put all your options in a parameter file and reference that file from the command line by preceding the file name with an “at sign” (@) prefix, as in:
> java com.cleancode.xml.XmlTransform @myParams.dat
XmlTransform treats options in a parameter file just as if you had typed them on the command line, except that they are not exposed to shell interpolation. This can avoid conflicts with special characters that your shell may want to process first (quotes, redirection, etc.) if they are provided directly on the command line. A second benefit to a parameter file is that you may use it to create a default set of options and then override or augment that set on the command line with either direct options or with another parameter file.
As an example, suppose you have a parameter file called inline.conf containing these three parameters (for simplicity the example options below are not real option names):
--x1=2 --x2=4.3 --x3=true
You could then specify options on the command line as:
> java class_name --x1=4 @inline.conf --x4=0.5
The parameter file is interpolated just as if you had written:
> java class_name --x1=4 --x1=2 --x2=4.3 --x3=true --x4=0.5
You can see the x1 option is effectively given twice; the last one encountered is the one that gets used , so the value of x1 will be two rather than four. An important point to note, then, is that if you wish to use a default set of options from a file and override them on the command line as needed, the parameter file specification must precede all the direct options on the line (unlike in the contrived example above where it is in the middle).
Yet another variation of this is simply to create a default set of options in one parameter file and a few customized options in another.
> java class_name @default.dat @variation-one.dat
Option Summary Below is the list of all the available options for XmlTransform. The following steps describe the details of these XmlTransform options in logical groupings. Also see the XmlTransform API for further details.
contentsBaseName -- base name for contents file (e.g. 'index') contentsToParent -- boolean indicating to place contents in parent or same dir debug -- enable all diags if true (including libraries) diagList -- string of single-character diags to activate dirList -- comma-separated list of dirs to process enable -- do processing if true; just report if false generateContents -- boolean switch to generate contents generatorNode -- tag name of node to replace with generator info groupIdXpath -- simple Xpath pointing to group id node in each file groupPlaceHolder -- tag name in contents file to put file list help -- show this list inExtension -- extensions of files to process inputSchemaSource -- global Schema file for input validation outExtension -- extensions for translated files outputSchemaSource -- global Schema file for output validation processAll -- process without checking date stamps if true sourcePath -- root of source XML tree startDepth -- number of directories from the top of your tree back to your own relative root targetPath -- root of target XML tree validateInputToSchema -- boolean switch to validate input validateOutputToSchema -- boolean switch to validate output validateXslBySchema -- boolean switch to validate needed XSL files xslName -- primary XSL file in each directory xslParmList -- comma-separated list of XSL parameters xslSchema -- name of Schema file for XSL files xslTransform -- boolean switch to do XSL translation
These remaining options are available in XmlTransform but are not specific to this application. Any application that uses the CleanCode diagnostic system would have these same options. Complete details of the Diagnostic API are available here .
.*_DIAG -- diagnostic level for any class CREATE_DIAG -- diagnostic level for object creations DIAG_LEVEL -- diagnostic mask ENV_DIAG -- diagnostic level for system environment FORMATTER -- class name for web channel formatting LOG_DIAG_NAME -- base file name for regular messages LOG_DIR -- directory in which to create log files LOG_ERR_NAME -- base file name for warning/error messages OUTPUT_DIAG -- output channels for regular messages OUTPUT_ERR -- output channels for warning/error messages SHOW_THREAD -- show/hide switch for thread information TRACE_DIAG -- diagnostic level for method enter/exit TRACE_INDENT -- indentation string for nested output VERSION_DIAG -- diagnostic level for module versioning WARNINGS_ON -- show/hide switch for warning messages
Listing 2 illustrates a complete parameter file from a sample project. The steps on the next page explain in further detail how to determine the appropriate values.
Step 1—What To Do XmlTransform supports several Boolean switches to enable or disable the supported functions:
xslTransform specifies to transform input to output using your XSLT specification (as opposed to simply validating a set of files).
generateContents specifies to generate contents files in the input tree (using contents template files as discussed earlier).
validateInputToSchema specifies to validate the input (using your XML Schema definition).
validateOutputToSchema specifies to validate the output (using your XML Schema definition).
validateXslBySchema is intended for advanced uses. It causes XmlTransform to validate the XSL file itself (not commonly needed because any errors will be evident when xslTransform is turned on).
Step 2—Where To Do It You specify where your input tree resides (sourcePath ) and where your output tree should be generated (targetPath ). Both default to the current directory if not specified. Next, you specify the input file extension (inExtension ) and the output file extension (outExtension ). These default to xml and html , respectively, if not specified.
Step 3—What To Do It With To transform each file, you must provide an XSL file that specifies the transform information (xslName ). If you specify an absolute path, XmlTransform will use that file at any subdirectory depth. If, however, you specify just a base file name (such as stuff.xsl ), then XmlTransform will look for a file with that name within each processed subdirectory. If it doesn’t find the file there, it looks for the file of that name in your root directory (sourcePath ). This provides flexibility,because you can specify one global XSL specification but override it in specific instances as required. Alternatively, if you don’t provide a root XSL file, then XmlTransform processes only those subdirectories that contain a local XSL file.
To generate a table of contents for any given subdirectory, you must provide an XML file template—this is just an XML file like any other among your files, with the exception that it has one or more placeholders for referencing lists of other files, as discussed earlier in the article. You must name the template file according to the following convention: an underscore, then the contentsBaseName parameter, a dot, and the inExtension parameter (for example, _myDir.xml ). XmlTransform fills in the template and stores it in your input tree as an intermediate file. The intermediate file name depends on whether you elect to store the contents file in the same directory as the contents or in the parent (contentsToParent ); if in the same directory, the name will be the same base name-dot-inExtension , less the underscore. You further need to specify the group placeholder (groupPlaceHolder ) indicating where in your contents template to insert content items.
Because XmlTransform writes intermediate files, in the source tree, it needs to be sure that it’s not inadvertently overwriting one of your content files. On the other hand, it can’t just check for the existence of the file, because XmlTransform may itself have created such a file during an earlier run. Therefore, it writes a generator identification string to a specified node in each generated intermediate file; the presence of that node indicates that the file can be overwritten. You specify what element in your XML should receive this generator identification string via the generatorNode option. The final option needed for contents file generation is groupIdXpath , which specifies an XPath expression used to find the group identifier in each file.
Step 4—How To Do It Finally, you need to provide a few details on how the program should operate.
Which Subdirectories To Process — Using the dirList option, you must explicitly specify which subdirectories under sourcePath XmlTransform should process. This string should be a comma- or semicolon-separated list of subdirectory names (relative to sourcePath ) such as sub1, sub1/subsub1, sub2, sub3 . If omitted, XmlTransform processes only sourcePath itself (with no subdirectories).
Preview Mode —Until you are comfortable with the program, or if you want to check new configuration options you have made, you may see what the program would do without actually doing it. You control this with the enable flag. If omitted, the default is true. When you set enable to false , XmlTransform does no actual work—but it does report what it would do.
Stingy Mode —In the spirit of economy, XmlTransform does only what is necessary. That is, it keeps track of what it has done on previous invocations, and only validates or transforms files that have changed. You may override this and force it to process all files using the processAll flag. If omitted, the default is false.
Tracking Subdirectory Depth —This option is intended for advanced use. XmlTransform keeps track of the subdirectory depth it’s processing, allowing you to define location-relative actions and paths. If for example, you are creating HTML as output, and you want to specify a relative path to an included file, you could use the depth to correctly generate a prefix such as “../../..” to prepend to a file name. Here’s a simple XSL template that can generate the appropriate path:
../
The startDepth configuration option (default=0) allows you to specify an offset between your root (sourcePath ) and the location of any referenced include files. For example, if your include files are in the directory above your HTML files, you could specify a startDepth of 1 , which would direct the above XSL routine to add an extra “..” in the path. Note that XmlTransform passes the depth of the subdirectory it is processing to the XSL transformation, as an offset to this starting depth using the parameter name level . Therefore, you would use a call-template element in XSL, passing the $level parameter as its argument to create your path string in the preceding example. This is just one example; the $level parameter has other uses as well. For example, you can invoke different templates within your XSL depending on your current level.
Providing Custom XSL Parameters —This option is intended for advanced use. Just as the current subdirectory level is passed in to your XSL as a parameter, you may also provide user-defined values to pass using the xslParmList configuration option. This string should be a comma-separated or semicolon-separated list of parameter settings. Each parameter setting must have the form name:value , and the values may contain neither commas nor semicolons. This is useful for passing in such values as a copyright date or a release version number, for example: xslParmList=copyright:2006,relVersion:v1.2 . In addition, XmlTransform makes the generator identification string (discussed earlier in the context of contents files) available automatically via this mechanism. To access the parameter, you first include this line in your XSL:
Then to use it, you might use something like this, if you are creating HTML or XHTML:
Overall, because of the number of options and their effects on the output, XmlTransform does have a fairly steep learning curve, but if you have a problem to tackle that it can handle, it can be quite a time saver. In the real world, XmlTransform originally served to generate static pages on my open source web site. Rather than write in HTML, I can write pages in a shorthand custom XML dialect and let XmlTransform automatically take care of the fancy headers, footers, page linkages, copyright date, and so forth. But XmlTransform is useful in other situations as well. For example, it can act as a SQL documentation generator akin to Ndoc (for C#) or JavaDoc (for Java). The article “Add Custom XML Documentation Capability To Your SQL Code ” provides a detailed explanation of how to use XmlTransform to accomplish SQL documentation. That article shares the same set of sample source files that you can find attached to this article, and you can use those to experiment with other uses for XmlTransform.