Browse DevX
Sign up for e-mail newsletters from DevX


Transform Legacy Data to XML Using JAXP  : Page 3

By writing to the JAXP specification, Java developers can create extensible code routines to parse any type of data into XML. This tutorial shows you how to write your own JAXP-compliant parser to transform legacy data in a comma-separated value (CSV) format, output it to a DOM, and then transform into XML.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Abstraction and Reuse
Now that I have my AbstractXMLReader class I can easily extend it to create any kind of parser I need. Because I want to create parsers to handle legacy data formats that represent rows with lines, I created another abstract class to parse each line. Here is the complete class.

   import java.io.*;
   import org.xml.sax.*;
   import org.xml.sax.helpers.*;
   public abstract class AbstractLineReader extends AbstractXMLReader
      public void parseInput(BufferedReader br) 
         throws IOException, SAXException
         String line = null;
         while((line = br.readLine()) != null)
            line = line.trim();
            if(line.length() > 0)
      public abstract void parseLine(String line) 
         throws IOException, SAXException;
This class loops through the input from the BufferedReader and then calls the parseLine method on the resulting String. Whatever class extends the AbstractLineReader must override the parseLine method and ultimately fire a SAX event based on the line.

Finally, I am ready to parse a CSV file. To do that I extend the AbstractLineReader class with a new class named CSVReader. Below is the shell class.

   import java.io.*;
   import java.util.*;
   import org.xml.sax.*;
   import org.xml.sax.helpers.*;
   public class CSVReader extends AbstractLineReader
      private ContentHandler ch = null;
This code first imports all the classes I need and then declares a ContentHandler. The ContentHandler is the representation of the DOM tree I want to build. Now that I have the shell class I can override the parseImplementation method.

   public void parseImplementation(BufferedReader br) 
      throws IOException, SAXException
      this.ch = getContentHandler();
      ch.startElement("", "", "csv", new AttributesImpl());
      ch.endElement("", "", "csv");
This code references the ContentHandler, and then begins to create the DOM tree. The CSV file has an element named csv for the root of the tree. This element will have no attributes, so I call the default AttributesImpl constructor. Everything in XML is a container, so I need to fire an event at the start and end of each container.

XML containers are, of course, represented with tags, so when the startElement method is called the resulting tag is . Later I will call the endElement method to create the closing tag , but first I will parse each line of the file by calling the parseInput method of the AbstractLineReader. Remember that the parseInput method calls the parseLine method on each line it finds. However, parseLine is an abstract method, so I need to override it in CSVReader.

   public void parseLine(String line) throws IOException, SAXException
      StringTokenizer st = new StringTokenizer(line, ",");
      String curElement = null;
      ch.startElement("", "", "line", new AttributesImpl());
      ch.endElement("", "", "line");
Because this is a CSV file, I know that each line is just a list of data separated by commas. To pull out each column's value, I create an instance of the StringTokenizer class. But first I create the tag , a, container for each row of data, using a new element named line. With that done a loop through each token will pass the token's value to the parseElement method. Once all of the tokens have been parsed, I can close the line container by calling endElement, which results in the tag .

   private void parseElement(String element) throws IOException, SAXException
      ch.startElement("", "", "value", new AttributesImpl());
      element = this.cleanQuotes(element);
      ch.characters(element.toCharArray(), 0, element.length());
      ch.endElement("", "", "value");
The String element that is being passed into the method parseElement is the actual value of the column I am looking for, and I want to wrap the column's value in a value container. To do this, the startElement method creates the tag . CSV files sometimes wrap the values of data in quotations, so I call the method cleanQuotes to overwrite the value of every element and strip off any quotation marks.

With the String element all cleaned up, I can finally put some data in the containers I have made. I do that by calling the characters method, which expects a character array as well as the index it should start with and the length. The String class's toCharArray method gets the character array and the length method finds its length. Finally, the endElement method creates the closing

   private String cleanQuotes(String element)
      if(element.startsWith("\"") && element.endsWith("\""))
         return element.substring(1, element.length() - 1);
         return element;
The cleanQuotes method checks to see if the element has a quotation mark at the start and end of the String. If it finds a quotation mark at the start and end, it strips them off and returns the String. Otherwise it returns the String untouched.

Reusing the Parsing Code
Before I use my new CSVReader class to transform CSV files into XML documents, I thought I would implement a PipeReader class to parse pipe-delimited files instead of CSV files. This helps demonstrate other ways you might make use of the parsing code once you've created it. Assuming I abstracted things correctly, the PipeReader class should be easy to create. The PipeReader class is the same as the CSVReader class, semantically; I have included the diff of CSVReader and PipeReader below to show how similar they are.

   < public class CSVReader extends AbstractLineReader
   > public class PipeReader extends AbstractLineReader
   <               ch.startElement("", "", "csv", new AttributesImpl());
   >               ch.startElement("", "", "pipe", new AttributesImpl());
   <               ch.endElement("", "", "csv");
   >               ch.endElement("", "", "pipe");
   <               StringTokenizer st = new StringTokenizer(line, ",");
   >               StringTokenizer st = new StringTokenizer(line, "|");

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date