RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX

By submitting your information, you agree that devx.com may send you DevX offers via email, phone and text message, as well as email offers about other products and services that DevX believes may be of interest to you. DevX will process your information in accordance with the Quinstreet Privacy Policy.


Transform Legacy Data to XML Using JAXP  : Page 2

By writing to the JAXP specification, Java developers can create extensible code routines to parse any type of data into XML. This tutorial shows you how to write your own JAXP-compliant parser to transform legacy data in a comma-separated value (CSV) format, output it to a DOM, and then transform into XML.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Writing the Parser
Before beginning, I should define exactly the type of file format that I'll be using. Well before XML came along, developers used various types of ASCII data formats to exchange data—and often still do. One of the most common of these formats is comma-separated values. The CSV format is a very straightforward way of representing tabular data. In a CSV file each line represents a row of data. The row data is actually just a list of column values delimited by commas. However, there are lots of other similar formats to CSV for ASCII data representation. The difference between these formats is often as minor as using a different delimiter, e.g. the pipe character.

There are many different approaches to parsing a CSV file. However, I want to parse the CSV file into a DOM document, so I'll use the simple API for XML (SAX). SAX, as you may know, is event driven. It is different from a DOM parser in that a DOM parser loads the whole file into memory whereas SAX simply fires an event every time it encounters a tag in the XML document. I chose SAX because it allows for parsing of the CSV file line-by-line and column by column, firing SAX events for each element. Each of these elements will then be added to the DOM tree resulting in a DOM representation of my CSV file.

In order to write a JAXP parser one needs to implement the XMLReader interface. Because I am going to write a few different parsers I decided to abstract XMLReader by creating an abstract class named AbstractXMLReader. To implement the XMLReader interface I first determined what imports and properties I would need for the required methods. Below is the XMLReader class with just the imports and properties.

   import java.io.*;
   import java.util.*;
   import org.xml.sax.*;
   public abstract class AbstractXMLReader 
      implements org.xml.sax.XMLReader
      private Hashtable handlers = new Hashtable();
      private Hashtable properties = new Hashtable();
      private Hashtable features = new Hashtable();
With the shell of my class ready, I can start implementing the required methods of the XMLReader interface, starting with the basic getters and setters.

   public void setContentHandler(ContentHandler handler) { 
      this.handlers.put("ContentHandler", handler); 
   public void setDTDHandler(DTDHandler handler) { 
      this.handlers.put("DTDHandler", handler); 
   public void setEntityResolver(EntityResolver handler) { 
      this.handlers.put("EntityResolver", handler); 
   public void setErrorHandler(ErrorHandler handler) {
      this.handlers.put("ErrorHandler", handler); 
   public ContentHandler getContentHandler() { 
      return (ContentHandler) 
   public DTDHandler getDTDHandler() { 
      return (DTDHandler) 
   public EntityResolver getEntityResolver() { 
      return (EntityResolver) this.handlers.get("EntityResolver"); 
   public ErrorHandler getErrorHandler() { 
      return (ErrorHandler) this.handlers.get("ErrorHandler"); 
As you can see, the XMLReader interface requires implementers to have methods to get and set four different handlers: ContentHandler, DTDHandler, EntityResolver, and ErrorHandler. I created a Hashtable named handlers to hold their values. You must be careful to cast objects that you get from Hashtables to their appropriate type because the get method always returns an Object. The XMLReader interface also requires accessors and mutators for supported properties and features, which need to throw specific exceptions in case a requested property or feature is not supported.

   public void setFeature(String name, boolean value)
      throws SAXNotRecognizedException, SAXNotSupportedException
      this.features.put(name, new Boolean(value));
   public boolean getFeature(String name) 
      throws SAXNotRecognizedException, SAXNotSupportedException
      Boolean value = (Boolean) this.features.get(name);
      return value.booleanValue();
   public Object getProperty(String name)
      throws SAXNotRecognizedException, SAXNotSupportedException
      return this.properties.get(name);
   public void setProperty(String name, Object value)
      throws SAXNotRecognizedException, SAXNotSupportedException
      this.properties.put(name, value);
Next comes the all-important parse method. This is an abstract class, so normally you want the parse method to be abstract. That way, any class that extends the abstract class would need to override the parse method. However, there is a certain amount of common work the parse method needs to do, so I added an additional method, parseImplementation, and made that method abstract instead.

   public void parse(String systemId) 
      throws IOException, SAXException 
      parse(new InputSource(systemId)); 
   public void parse(InputSource input) 
      throws IOException, SAXException
      BufferedReader br = null;
      if(input.getCharacterStream() != null)
          br = new BufferedReader(input.getCharacterStream());
      else if(input.getByteStream() != null)
         br = new BufferedReader(new InputStreamReader
      else if(input.getSystemId() != null) {
         java.net.URL url = new java.net.URL(input.getSystemId());
         br = new BufferedReader(new InputStreamReader(url.openStream()));
         throw new SAXException("Invalid InputSource object");
   public abstract void parseImplementation(BufferedReader br) 
      throws IOException, SAXException;
The XMLReader interface requires an overloaded parse method. My parse method doesn't actually parse anything; it offloads the actual parsing to the parseImplementation method. It sets up a BufferReader for the passed in InputSource and throws a SAXException in the case that it can't create a BufferReader.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date