ebster’s Dictionary defines a fop as being synonymous with a “dandy,” a person (usually male) who spends an inordinate amount of time and effort on dress and appearance, sometimes to ludicrous extremes. Think of the gold-chain-festooned white-polyester clad lounge lizard of the 1970s and you’ll get the basic idea. However, as with so many other terms, FOP has resurfaced with a different meaning?as an acronym for the Formatting-Object Processor, part of the Open Source Apache Project. The FOP processor performs an interesting stunt: it converts an XSL-FO file into an Adobe Postscript Description Format (PDF) file.
XSLT and XSL focus on data-centric and document-centric transformations, respectively, but they are related; You need both to create efficient document-to-print transformations. How do XSLT, XSL, and XSL-Formatting Objects (XSL-FO) fit together?
XSL-FO is a page description language. It’s a language specifically designed for working with fairly sophisticated page content; consequently, it can be surprisingly difficult to master well. Rather than coding XSL-FO documents manually it’s far better to create an XSLT document that will handle the transformation for you.
A Little XSL History
PDF files encode detailed information, font content and graphics within a single file, and have become a standard for document display, just as Postscript has become integral in the area of document printing.
The XSL-FO part, on the other hand, requires a little more explanation, as it is a format that many people think they know but really don’t. In the late 1990’s, as the XML standard was coming together, the W3C Stylesheet working group wanted to create a generalized page description language that would be able to convert an XML document into a presentation. Unfortunately HTML, even as XHTML, isn’t apt at detailed presentations, because the very features that make web browser displays convenient, such aslong scrolling panes of information, don’t fare especially well when split into individual pages.
For example, even simple elements such as headers and footers are problematic. Columns are difficult to format effectively. Specifying print dimensions can be frustrating, because the concept of “width” in a web page is very different from the same concept in most printed output. Finally, HTML is very imprecise, even using CSS positioning, therefore you must add proprietary extensions if you want the output to do more than vaguely resemble the quality of print-only media.
Therefore, the stylesheet group recommended an Extensible Stylesheet Language (or XSL) that would include two components – a descriptive language for formatting specific content, and a transformative language for converting XML into the descriptive language. As it turned out, the simpler of the two languages ended up being the transformation language, which, because it was originally deemed the less essential of the two tasks, was given the name Extensible Stylesheet Language for Transformations, or XSLT. However, as XML has become more data-centric, the role of XSLT as a mechanism for general transformations has become much, much more prominent, while the rest of the XSL specification was relegated to the background.
Eventually though, the XSL recommendation was released, in October, 2001?nearly two years after the XSLT recommendation. Because of the extreme prominence that XSLT has achieved, even though the recommendation is titled Extensible Stylesheet Language (XSL), the page-description portion of XSL is commonly called XSL-FO, where FO stands for Formatting Objects (hence FOP).
A Short XSL-FO Primer
XSL-FO is a page description language. It’s a language specifically designed for working with fairly sophisticated page content; consequently, it can be surprisingly difficult to master well. You won’t be throwing away your copy of Quark or Pagemaker any time soon?but don’t be surprised to see Pagemaker?also an Adobe product?generating XSL-FO eventually. Adobe is perhaps the prime mover behind XSL-FO, though IBM, Sun, Xerox, and other companies also helped author of the XSL Recommendation.
XSL-FO uses the fo: namespace, xmlns:fo = “http://www.w3.org/1999/XSL/Format”, to identify fo: elements contained within a
I created a very simplified (ad hoc) XML schema to describing the sample document included with this article (although you could easily use something like DocBook to do much the same thing). The schema isn’t the formatting code; it’s just a simple “logical” breakdown of the document (see Listing 1).
|Author Note:: The sample document contains an early version of this article?there may be minor differences between the sample document and the final version.|
The XSL-FO markup for this document can look a little intimidating, but it’s actually pretty straightforward. All XSL-FO documents begin with a
The next element should be a layout master set. This is a collection of masters that the document requires. For the current article, the name of this simple page master is (not surprisingly) “mainPage”, but it could be pretty much anything?the master-name attribute just provides a value to refer to the page master:
The page master defines the height and width of the page, as well as the dimensions of the margins. Note that the units involved can be any standard CSS units: inches (in), centimeters (cm), millimeters (mm), points (pt), etc. These dimensions are printer page dimensions?if you wanted to print to an 11×17 broadside, for example, you’d specify a page-height of “17in” and a page-width of “11in”. The margins define the actual “printable” area on that page, given as an offset from the page itself along the respective axis.
The page itself is then broken into three distinct areas?the region-before, used to set header information (such as the title of the article), the region-after, which holds footer information such as page numbers, and the region-body, which is the active area where the process inserts the body of the text. The margins here work relative to the margins defined by the page itself, with the extents giving the amount before or after the body that the headers or footers extend respectively.:
This defines one master, but it doesn’t tell the order that the master appears. You do that in the page sequence master, which can describe both single instances and repeating collections of pages. The “simpleDoc” sequence master in the following example consists of nothing but repeating page masters named “mainPage.”
After defining the page sequence master, you can begin adding content. For a given page sequence adding content involves both defining static content?content such as footers or headers that either do not change or change predictably (such as page numbers) across multiple pages?and flow content, which consists of the main body of the article. You should declare the static content for the header first:
XML 10 Minute Solution: Getting Fancy With FOP
The master-name attribute in the
A note about attributes. Many of the attributes you’ll see within both
The footer demonstrates that static content isn’t really all that static:
Copyright 2001 Cagle Communications -- Page
The footer content includes the
The final (and arguably most important) part of the document is the
For example, Listing 2 contains a flow object that shows the title, subtitle, author, and the first paragraph of the article itself
Getting Fancy With FOP Creating Adobe Acrobat Files from XSLT and XSL-FO by Kurt Cagle Webster's Dictionary defines a fop as being synonymous with a "dandy," a person (usually male) who spends an inordinate amount of time and effort on dress and appearance, sometimes to ludicrous extremes. Think of the gold-chain-festooned white-polyester clad lounge lizard of the 1970s and you'll get the basic idea. However, as with so many other terms, FOP has resurfaced with a different meaningas an acronym for the Formatting-Object Processor, part of the Open Source Apache Project. The FOP processor performs an interesting stunt: it converts an XSL-FO file into an Adobe Postscript Description Format (PDF) file.
Finally, the blocks may potentially contain in-line elements. An inline element, as mentioned earlier, is an element that is part of the flow of text. For example, in HTML the element is an inline element that sets the font-weight of the enclosed text to “bold”. In the article, rather than creating and elements, which give no real clue as to why they are bold or italic, I have instead three distinct inline elements:
Note that the example given here is very simple?the full XSL-FO specification is more than 300 printed pages in length, and can be extraordinarily complicated. However, that size makes it robust enough to handle a wide variety of applications.
Creating XSL-FO Using XSLT
The full code to describe even a relatively simple XSL-FO document like this one can easily be overwhelming if you attempt to write the XSL-FO code by hand, especially because there’s a great deal of repetition. Ideally, rather than coding a FO document manually (which you should do once, but only once) it’s far better to create an XSLT document that will handle the transformation for you. Fortunately, such transformations are fairly easy to generate.
For example, the file createFOP.xsl contains the transformation for this document. The
Additionally, set the method attribute of the so the stylesheet generates xml output (method=”xml”) and so the result has elements indented, and does not include the XML declaration.
The next step to create any kind of document navigator is to figure out how to handle unspecified children. For instance, while the stylesheet uses both the
: Copyright - Page
Much of the remainder of the stylesheet generates the appropriate
The one single contentious area in this stylesheet involved displaying XML code. XSL-FO does not have an element analogous to the HTML