devxlogo

Publishing XML Documents in PDF and HTML with Cocoon

Publishing XML Documents in PDF and HTML with Cocoon

his article shows you how to publish XML documents in PDF using Apache Cocoon. Cocoon can create result documents in a variety of formats, including HTML and PDF. Cocoon’s popularity reflects that of the Adobe’s PDF format, which most vendors prefer for e-mailing richly formatted printable documents and reports. Cocoon converts not only text content, but also graphs and images. This article focuses on using Cocoon to dynamically generate HTML and PDF.

Although there are alternatives for generating PDF, Cocoon is preferred because it provides various value-added features. For example, the Cocoon site says: “Cocoon interacts with many data sources, including file systems, RDBMS, LDAP, native XML databases, SAP systems, and network-based data sources. It adapts content delivery to the capabilities of different devices like HTML, WML, PDF, SVG, and RTF, to name just a few. You can run Cocoon as a Servlet as well as through a powerful, command line interface. The deliberate design of its abstract environment gives you the freedom to extend its functionality to meet your special needs in a highly modular fashion.” You can find a complete feature list at http://cocoon.apache.org/2.1/features.html.

What You Need
To gain the greatest benefit from this article, you should be somewhat familiar with XML/XSL and J2EE. You can use any Java Web application server; I used the JBoss 3.2.1 server to test the article code.

Follow these steps to get started:

  1. Download Cocoon 2.15
  2. Build and install Cocoon
  3. Create or download the source code files to build the examples in this article (the XML, XMAP, XSL, and JSP files)
  4. Test the application

The rest of this article describes how to perform each of the steps in more detail.

Download the Cocoon Source
To get Cocoon and configure it for your server, first download the latest version from http://Cocoon.apache.org/2.1/index.html.

Building Cocoon
The following procedure lists the steps to build and deploy Cocoon.

  1. Execute build (default target)?just type build on the command prompt in the directory where you installed Cocoon (by default that’s C:JavaCocoon-2.1.5.
  2.  
    Figure 1. The Cocoon Welcome Page: After building and installing Cocoon as described in this article, you should see this page by browsing to the default Welcome page (http://localhost:8080/cocoon/).
  3. Next, build the war file using the command build war.
Author’s note: These are separate commands. Perform both build, and build war.

  1. Deploy the Cocoon Web application war file by copying the Cocoon.war file generated by the build process in step 2 from Cocoon-2.1.5uildCocoon-2.1.5Cocoon.war to the defaultdeploy directory. When you complete these steps Cocoon should be up and running on JBoss.
  2. Test your Cocoon installation by starting the JBoss server and go to http://localhost:8080/cocoon/. If the installation succeeded, you’ll see the screen shown in Figure 1.
Author’s Note: The information in this article assumes that you’re running your local HTTP server on port 8080.

Create the Source Code Files
Navigate to the webapp directory in Cocoon and create an xmltopdf directory in it. Cocoon provides much more than dynamic PDF generation. In this article, you’re using it only to generate HTML and PDF from an XML file data source, which you’ll do by creating a JSP page. You should understand though that Cocoon actually uses XSP, which is a file format that combines Java and XML to render output. You can avoid having to learn XSP by using the techniques shown in this article, taking a little extra care to write the XML syntax in such a way that Cocoon can still render it. Using a JSP page in this way works fine for outputting HTML and PDF files; however, if you plan to use Cocoon to develop a complete Web site, then you should bite the bullet and learn XSP. Getting back to the example, create a file called pageOne.xml inside the xmltopdf directory that has the following content:

               This is the pageOne.xml example             

This is the text of section one

You will also create a second source file?a JSP, separately (outside of Cocoon). This way, you can pass any JSP independently as a source to this framework, and Cocoon will serialize it into an HTML or a PDF file. For the purpose of this article, put the following file in a folder called jsptopdf. Next create a war file and deploy it on JBoss.

               This is the pageTwo.jsp example          

This is the text of section two. The time now is

Now test run the JBoss server, and see if you can view this JSP. Because the file is independent of Cocoon, you should be able to see the page rendered.

Note that you must make sure to build pageTwo.jsp with well-formed XML syntax, otherwise Cocoon will complain. You would keep this JSP page in a Web application directory, not in the Cocoon directory. For this example, put the file in a jsptopdf directory. Make a war file for this too, and deploy it in a similar fashion as the previous Cocoon war file. You keep the file outside of the Cocoon war because Cocoon will only render files based on the requests defined in the sitemap.xmap file (described below).

The XML in the first file you created serves as the data source. Data in the second file can be fetched from any java object or through JDBC. To transform these files, you’ll define XSL files now. Make an XSL stylesheet file called doc2html.xsl?you’ll use this stylesheet to transform both the files you’ve already created.

                                           

Using Pipelines in Cocoon
Cocoon utilizes the concept of pipelines. You have XML to publish and XSL to transform it, now you need to create the sitemap file that Cocoon requires. This file has the ability to select the pipeline based on the browser request. Create the sample sitemap.xmap file that appears below. This file tells Cocoon how to map requests. For example, using this file, HTML requests would have an XML source and would be rendered as HTML. Similarly, PDF requests would have either an XML data source or a JSP data source and Cocoon would render them as PDF.

                                                                                                                                                                                                                                                                                                                                                                                                                           

Another advantage of defining the request mappings is that if you don’t want users to be able to browse to an XML file directly, you just don’t define the mapping for it. To let users see the page either in PDF or HTML, you provide them with hyperlinks to the desired file type, and they would see HTML or PDF, regardless of whether the actual data source is JSP or XML.

Cocoon uses FO to generate PDF. If you regenerate and deploy the war file again at this point, the HTML transformation would work?but PDF transformation requires some more work. To accomplish that transformation, define a file called doc2pdf.xsl (see Listing 1).

The file in Listing 1 is similar to a normal XSL file. The heading section can pretty much be reused, and the section which defines how to display PDF is also very similar to the way you would normally define an XSL file. To get more detail about how to write XSL-FO files, refer to Apache FOP project. At this point you’re ready to regenerate the war file. Follow the same steps as before to build (use the command build war) and then deploy the war file.

Test the Application
To test the application, browse to these URLs:

   http://localhost:8080/cocoon/xmltopdf/pageOne.pdf   http://localhost:8080/cocoon/xmltopdf/pageTwo.pdf   http://localhost:8080/cocoon/xmltopdf/pageOne.html   http://localhost:8080/cocoon/xmltopdf/pageTwo.html

As an example, the output for pageTwo.pdf will look something like Figure 2.

 
Figure 2. Coccon PDF Output: The figure shows Cocoon’s PDF output for the pageTwo.pdf file.

Following the steps listed in this article should get you up and running quickly using Cocoon. Download all the source code for this article to get a quick example. For follow-up information, I suggest you visit Apache’s Web site. Cocoon collaborates with other Apache Projects and has a lot more to offer than the simple features described in this article.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist