devxlogo

Practical XML for Java Programs

Practical XML for Java Programs

he true raison d’The Dos and Don’ts of XML
XML is a universal format for transferring hierarchically organized data. When properly formatted, XML documents are readable, self-explanatory, and platform-independent.

In practice, certain general XML features can be discarded: entities, DTD (Document Type Declaration, now mostly replaced by Schemas), and processing instructions (programmers generally keep their “processing instructions” in programs). The following are some tips for making the best use of these features.

DTD Doesn’t Know What the App Needs
XML parsers check whether an XML document is well-formed, while DTD defines the document structure and can be used to check the semantic integrity of a document (whether a node has certain attributes, whether it is present, etc.). However, an application normally has its own semantic integrity criteria and knows better which pieces to throw away and whether to discard a document that does not contain the necessary data. DTD is defined by the producing side, which has no clue what the requirements are on the receiving side. (I’m not even going to discuss why the producing side would produce data inconsistent with its own specifications).

Don’t Multiply Entities
XML entities are like C macros: they save typing time, but they are hell to maintain. Entities save some bytes, but that hardly makes any sense these days?at least for text files. Furthermore, using entities makes life harder on both the producing and receiving ends. There are only six entities that you cannot avoid:

Entity<>'"&]
Java Value‘<'‘>’”’‘”‘‘&’‘]’

With these six entities you will never need to use CDATA.

Order Does Not Matter
In a relational database, rows in a table have no specific order, and for good reason. Similarly, the subnodes of a node should not have any specific order. They do follow each other in the file, but semantically they are all created equal. Subnodes could and should be grouped into an unordered collection by the types specified in their start-tags and end-tags.

No Uplinks
Imagine files in a JAR referencing the JAR, class members referencing the class, words referencing the sentence. Such weird structures shouldn’t exist, right? Neither should subnodes have any knowledge of the nodes in which they are contained. You can deduce this information by looking at an XML file. The node/subnode relationship describes the container, not the contained node.

Keep It Simple
Without DTD, a file can be self-contained with no references to third parties. Because they have no order, you can easily move around and regroup subnodes without harming the contents. That is, if a node contains text, the text can be considered a single chunk?not a collection of fragments as in HTML. In XML, you can always wrap chunks of text in subnodes if you want to split it. Parsing XML Files
Traditional DOM converts even a simple XML file into a complex, rigid structure in memory. Furthermore, “XML navigation” has its own special language, XPath. Using it, you can browse the data you have just read. You just have to form something that reminds your SQL statements to extract the data you need. Many SQL servers now can return data in XML format, but so what? Do you need one more SQL statement to parse the resultset? A much simpler view of XML does exist though (e.g., XmlTree class in Tomcat 3.3.1).

An XML document is a unidirectional, tree-like structure, with nodes having attributes and leaf nodes having text values. A node’s subnodes can be grouped according to their types. In SAX, subnodes are known as kids. Here is an example of how it works:

XmlData project = XmlReader.read(new File("myProject.xml")); String moduleName = "Nails"; for (Iterator i = project.getKid("module", moduleName).getKids("file".iterator();
i.hasNext();) { XmlData fileData = (XmlData)i.next(); System.out.println("Project " + project.getAttribute("id") + ", module " + moduleName + " file " + fileData.getAttribute("id") + " is of type " + fileData.getAttribute("type"); }

In this example, the code read an XML file into an XML data container. Then it retrieved kids of type “file” from the XML data container, which is the kid of type “module” and has an attribute id="Nails". Then it scanned through the collection of such kids and retrieved their attributes. How the Package Works
The package for handling XML data in Java contains an interface (XmlData, see Listing 1), its basic implementation (BasicXmlData, see Listing 2), and XmlReader and XmlWriter classes. XmlReader (see Listing 3) uses SAX1 or SAX2, whichever is available, to parse input into BasicXmlData. It can read from InputStream, Reader, File, or from a String.XmlWriter (see Listing 4) neatly formats XmlData and sends it to a stream or a file.

The Interface
XmlData is the interface that defines basic XML data management functionality. It has four properties:

  1. Type ? an immutable String
  2. Value ? a String
  3. Attributes ? a Map that can be set from AttributeList, or a String array, or from a Map
  4. A collection of XmlData kids

getAttribute(String name) retrieves individual attributes and setAttribute(String name, String value) sets them. Two methods, getId() and getName(), return the values of attributes “id” and “name”.

getAllKids() and getKids(String type) return a Collection of kids. You can retrieve a kid with a specified attribute value: getKid(String type, String attributeName, String attributeValue); getKid(String type, String id) is the same as getKid(String type, "id", String attrValue). You can also addKid(XmlData), removeKid(XmlData), or removeKids(String type).

Implementation
BasicXmlData is the basic implementation of XmlData. The following constructors are available (see Listing 2 for the full list):

  public BasicXmlData (String type, String value, Collection kids);   public BasicXmlData (String type, String value, String[] attrs, XmlData []kids);   public BasicXmlData (String type, String value, AttributeList attrs);   public BasicXmlData (String type, String value, Map attrs, Map byType);   public BasicXmlData (InputStream in);   public BasicXmlData (URL sourceUrl); 

The following are persistence methods:

  public void save(String filename);   public void save(File file);   public void save(OutputStream os); 

These three methods help to instantiate user-defined classes that implement XmlData. The user-defined classes should also have to contain a constructor that takes XmlData as a single argument. The simplest constructor is castKids(String type, Class clazz). Look at the following example:

public class Project extends BasicXmlData  {   private final static String POSTFIX = ".project.xml";   private final        String filename;   public Project(String name) throws IOException,                                                              InstantiationException,                                                              ClassNotFoundException {     super(new File(name + POSTFIX));         filename = name + POSTFIX;     castKids("module", Class.forName("com.borland.catkit.Module"));   } 

As a result, all the kids of this project are of class Module.

Another method, cast(Map typemap, XmlData.Policy policy), can recursively cast a node with all its subnodes. The classes to cast to are defined in the map typemap, type ?> Class. Policy specifies behavior on error, for example:

  • XmlData.Policy.SKIP_ON_ERROR specifies that nodes that fail to cast for some reason (missing constructors, class not found) are skipped.
  • XmlData.Policy.KEEP_ON_ERROR specifies that nodes that fail to cast are kept as is.
  • XmlData.Policy.THROW_ON_ERROR specifies that no action, exceptions are not intercepted.

Yet another casting method, cast(String packageName, BasicXmlData.Policy policy), casts the whole tree into classes within a specified package. Class names are the same as types of nodes. Look at the following example:

Project project = new BasicXmlData(new File("myproject.project.xml")).cast("untitled1",
Policy.KEEP_ON_ERROR);

The package may be incomplete for a full-time XML professional, and similar problems exist with Java (e.g., can’t override an operator, can’t manipulate memory allocation). So the choice for a practical Java programmer is either flood the code with complicated SAX callbacks and obscure DOM structures, or using this package, manipulate XML data for functions such as writing several lines to get Nasdaq quotes.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist