Simplify Java XML Parsing with Jakarta Digester

Simplify Java XML Parsing with Jakarta Digester

he Digester framework is a high-level interface that parses an XML stream and populates the data into Java objects based on rules provided to the Digester component. Among the other XML parsing options available, the Digester package offers greater simplicity. With very simple classes and a collection of predefined Rules included, Digester simplifies the parsing of complex XML schema.

Because Digester requires an XML parser that conforms to JAXP version 1.1 or later, the Digester component uses the SAX parser for the actual parsing. It is easier to use than SAX alone, however, because Digester hides all the complex parsing maintenance. The other main API for XML parsing, DOM, uses too much memory to be a practical solution for large documents—and don’t you deal with large documents most of the time in the real world? Since Digester is just a layer over SAX, the difference in memory usage between DOM and Digester is the same as that between DOM and SAX. (Click here for a good comparison of the two.)

Although Digester does not perform data binding like the other options such as JAXB and XMLBeans, it provides the flexibility to create the Java classes that your architecture requires—not ones that the XML semantics demand. It allows triggers to be executed with the Rules that you provide it.

Digester Under the Hood
Digester depends on the following Jakarta Commons components, which must be in the classpath when you use Digester. (Refer to this status file page for more information about these dependencies.):

  • BeanUtils Component
  • Collections component

Using and Customizing Digester
Digester is simplest to use when you have direct mapping between the input XML stream and the Java objects.

To begin creating the Rules, you need to complete the following four steps:

  1. Identify the mapping from the source (i.e., input XML stream) to the output (i.e., the Java objects).
  2. Identify the pattern elements in the XML that contain data you need.
  3. Identify the data components that will hold the data.
  4. Create rules based on your findings and assign them to Digester.

A simple example (input.xml) should make this process clearer. Listing 1 shows the XML that you need to parse.

Listing 1. input.xml			books		xml	      20

Listings 2 and 3 show the Java class that you need to populate.

Listing 2. Response Class    public class Response {        private int _matches = 0;        private Request  _request;        public Request getRequest() {            return _request;        }        public void setRequest(Request request) {            _request = request;        }        public int getMatches() {            return _matches;        }        public void setMatches(int matches) {            _matches = matches;        }   }

Listing 3. Request Class public class Request { private String _name = ""; private String _value = ""; public String getName() { return _name; } public void setName(String name) { _name = name; } public String getValue() { return _value; } public void setValue(String value) { _value = value; } }

Listing 4 shows the class that parses the XML using Digester. You set the Rules in Digester with this class.

Listing 4. DigesterExample Classimport org.apache.commons.digester.*;import;import;public class DigesterExample {    public static void main(String ar[]) {        try {         Digester digester = new Digester();         digester.setValidating( false );         digester.addObjectCreate( "response", Response.class );         digester.addObjectCreate( "response/request", Request.class );         digester.addBeanPropertySetter("response/request/name", "name" );         digester.addBeanPropertySetter("response/request/value", value" );         digester.addSetNext( "response/request", "setRequest" );         digester.addBeanPropertySetter( "response/matches", "matches" );         Reader reader = new StringReader(                 "" +                  "" +                      "booksxml" +                       "20" +                  "");         Response response = (Response)digester.parse( reader );         System.out.println( response.toString() );      } catch( Exception exc ) {         exc.printStackTrace();      }    }}

Listing 4 proves how easy and straightforward using Digester is—especially when you have a direct mapping between the XML and the Java classes. (The Rules that come with the Digester package are sufficient to do the mapping and should serve as a constant reference while you read this article.) The element matching patterns within the class define when a particular Rule is fired. Each Rule extends the org.apache.commons.digester.Rule and defines the action that occurs when the rule is fired.

Digester in the Real World
Although Digester is a straightforward solution, things are not always so ideal in the real world. For example, say you have to parse XML whose elements keep changing based on the input. You typically find such request/response streams in the search world. If you search for a book, you get content that contains data to be populated in the Book object. Searching for a magazine returns content that contains data to be populated in the Magazine object. In such a situation, you end up writing custom Rules.

Listing 5 shows the response from a book search. The boldface lines show the dynamic section of the response.

Listing 5. BookResponse.xml			books		xml	      2                                                book1                  author1                                              book2                  author2                              

Listing 6 shows the response from a magazine search. Again, the boldface lines show the dynamic section of the response.

Listing 6. MagazineResponse.xml			magazines		security	      3                                  securityMagazine 1                  securityMagazine2                  securityMagazine3                

You can use the same Digester class you used earlier (DigesterExample) with a little modification to parse XML whose elements continually change this way. Just add the two new methods shown in Listing 7 to the Response class.

Listing 7. Content Related Methodspublic void addContent(Object o) {    _content = o;}public Object getContent() {    return _content;}

Now, you need to create a custom Rule that gets triggered whenever Digester encounters the content element. Listing 8 demonstrates how to use a custom Rule. The boldface lines are the extra code that is required.

Listing 8. DigesterExample Class Using a Custom Ruleimport org.apache.commons.digester.*;import*; public class DigesterExample {    public static void main(String ar[]) {        Class contentClass = ar[0];          ContentBuilder contentBuilder = ar[1];        try {         Digester digester = new Digester();         digester.setValidating( false );                  digester.setRules(new ExtendedBaseRules());          digester.addObjectCreate( "response", Response.class );         digester.addObjectCreate( "response/request", Request.class );         digester.addBeanPropertySetter("response/request/name", "name" );         digester.addBeanPropertySetter("response/request/value", value" );         digester.addSetNext( "response/request", "setRequest" );         digester.addBeanPropertySetter( "response/matches", "matches" );                  digester.addObjectCreate("response/content", contentClass);         digester.addRule("response/content/?",new DefaultRule(digester, contentBuilder));         digester.addSetNext("response/content","addContent", "java.lang.Object");            File input = new File("input.xml");         Response response = (Response)digester.parse( input );         System.out.println( response.toString() );      } catch( Exception exc ) {         exc.printStackTrace();      }    }}

Two arguments are passed to the program, and the Classes are passed through the command line. The first argument is “contentClass”, which is the container for the data in the content elements (boldfaced text in Listings 5 and 6). So, for a Book content item, you need a Book.class. The second argument is the class that is responsible for populating the Book.class.

Listing 9 shows the code for a custom Rule. The boldfaced text, getDigester.peek(), returns a reference to the object on the stack. In this example, it would be the type of content object based on the search request.

Listing 9. CustomRule Classimport org.apache.commons.digester.Rule;import org.apache.commons.digester.Digester;public class DefaultRule extends Rule {    public DefaultRule(Digester digester,  ContentBuilder builder) {        super();        _digester = digester;        _builder = builder;    }   public void body(String namespace, String name, String text)    throws Exception {        _builder.addAttribute(name, text, getDigester().peek());    }}

Listing 10 shows the BookBuilder class. The boldfaced text is a reference to the Book object.

Listing 10. BookBuilder Classpublic class  BookBuilder implements ContentBuilder{      public void body(String name, String text, Object object)      throws Exception {          Book book = (Book)Object;          If (name.equals("title")) {              book.setAuthor(text);          } else if (name.equals("author")) {              book.setAuthor(text);         }     }  }

Since you extended the Rules, you have to tell Digester to use the ExtendedBaseRules class, which allows more kinds of matching patterns. The methods exposed when you extend the Rule object are add, begin, end, finish, and body.

Just like that, you’ve seen how the Digester package offers simplicity in parsing XML. You can use it with a straightforward mapping, as well as more complex XML schemas, with some simple variations.

Share the Post:
XDR solutions

The Benefits of Using XDR Solutions

Cybercriminals constantly adapt their strategies, developing newer, more powerful, and intelligent ways to attack your network. Since security professionals must innovate as well, more conventional endpoint detection solutions have evolved

AI is revolutionizing fraud detection

How AI is Revolutionizing Fraud Detection

Artificial intelligence – commonly known as AI – means a form of technology with multiple uses. As a result, it has become extremely valuable to a number of businesses across

AI innovation

Companies Leading AI Innovation in 2023

Artificial intelligence (AI) has been transforming industries and revolutionizing business operations. AI’s potential to enhance efficiency and productivity has become crucial to many businesses. As we move into 2023, several

data fivetran pricing

Fivetran Pricing Explained

One of the biggest trends of the 21st century is the massive surge in analytics. Analytics is the process of utilizing data to drive future decision-making. With so much of

kubernetes logging

Kubernetes Logging: What You Need to Know

Kubernetes from Google is one of the most popular open-source and free container management solutions made to make managing and deploying applications easier. It has a solid architecture that makes

ransomware cyber attack

Why Is Ransomware Such a Major Threat?

One of the most significant cyber threats faced by modern organizations is a ransomware attack. Ransomware attacks have grown in both sophistication and frequency over the past few years, forcing

data dictionary

Tools You Need to Make a Data Dictionary

Data dictionaries are crucial for organizations of all sizes that deal with large amounts of data. they are centralized repositories of all the data in organizations, including metadata such as