devxlogo

Apache’s Xindice Organizes XML Data Without Schema

Apache’s Xindice Organizes XML Data Without Schema

ML is well deserving of its popularity. Developers are finding myriad uses for it, including application configuration files and object persistence. While using XML in this capacity has many benefits, it can also become an organizational nightmare.

At first glance, a relational database management system (RDBMS) seems like a good way to organize all of your disparate XML data. However, mapping XML documents to relational models is not only difficult, but often results in ugly schemas. For many the answer lies in using a native XML database instead of a traditional RDBMS. This article will describe what a native XML database is, introduce Apache Xindice, and show how to make use of Xindice in a Java application.

As defined by the XML:DB initiative, a native XML database is simply a database for storing and accessing XML using XML. This is different from a relational database in that XML data must by stored as tabular data, accessed using SQL. While it is possible to store XML data in a relational database as a CLOB or map the XML data to a schema, these methods each fall down in their own ways. Storing the data as a CLOB eliminates the need to map the structure of the XML document to a schema, but it doesn’t allow the database to understand the structure of the document. This makes it impossible to query the data effectively or update specific sections of the document. Mapping the XML document’s structure to a relational schema overcomes these issues, but can heavily degrade performance.

A database that is contextually aware of the structure of XML documents?a “native” XML database?can solve the tradeoff. This method does not require you to store the data in XML. Instead it understands the structure of the contained XML documents, which is important; it allows the documents to be queried and updated using appropriate XML technologies, the first of which is a W3C specification known as XPath. Using XPath it is possible to obtain a set of XML elements contained in a given XML document that conforms to the parameters of the XPath query.

The second technology of importance is XUpdate as defined by XML:DB. XUpdate makes it possible to update specific elements of an XML document without having to overwrite the entire document. It is extremely useful, particular with very large XML documents.

Powerful Collection Management
Apache Xindice is quite simply an implementation of a native XML database using Java. Xindice provides an implementation of XPath for its query language and XUpdate for its update language. Additionally, Xindice implements the XML:DB API. The XML:DB API is conceptually similar to ODBC and JDBC, but it has different levels of interoperability. The Xindice implementation is core level 1 with some optional services such as XUpdate and CollectionManagement, and it can be accessed using CORBA, Java, or XML:RPC.

Further, Xindice provides some proprietary functionality as well as experimental features, which are beyond the scope of this article. Instead I will cover how to insert, select, update, and delete XML data using the Java API and how to manage collections using the proprietary CollectionManager service. Xindice’s proprietary CollectionManager service is much more powerful than the XML:DB CollectionManagement service. For example, Xindice’s proprietary CollectionManager service allows for collection configurations that are specific to Xindice such as GZip compression and indexes.

Xindice stores all XML documents inside of collections. Thus every XML document must be stored in at least one collection. Collections can be nested and are considered part of an XPath query string, so there is always a root collection. While collections can be used strictly for organizational purposes, Xindice also allows for indexes to be created on collections to increase XPath performance. I am not going to use indexes in this article as all my examples will work on generic XML data; creating indexes would require contextual understanding of the XML data.

I plan to create a single wrapper class to hold all of my examples for working with Xindice. To get started I need to do various imports, set up the shell class, and declare some class properties for later use.

     import org.xmldb.api.base.*;     import org.xmldb.api.modules.*;     import org.xmldb.api.*;          import org.apache.xindice.client.xmldb.services.*;     import org.apache.xindice.xml.dom.*;          public class DBManager     {          Collection col = null;          Class c = null;          Database database = null;          public DBManager() throws Exception          {               c = Class.forName("org.apache.xindice.client.xmldb.			   DatabaseImpl");               database = (Database) c.newInstance();               DatabaseManager.registerDatabase(database);          }     }

The above code imports all of the needed classes for the XML:DB API as well as some specific to Xindice. Because org.xmldb.api.base.Database is a generic interface for working with XML databases, I need to create an instance of the Xindice driver that implements the Database interface. Instead of working with the Xindice implementation specifically, I cast it to down to the interface to ensure my code is interoperable. Finally, I register the database driver.

With my shell class created I am ready to move on to creating my first collection. Much like documents, collections must be created inside of other collections. When Xindice was started it used its configured database instance as the root collection, which by default is named db. Thus I am going to create my new collection in the db collection. I will do this with a method named createCollection that accepts a single String as a parameter indicating the collection name. The method body follows:

   col = DatabaseManager.getCollection("xmldb:xindice:///db/");   CollectionManager service =   (CollectionManager) col.getService("CollectionManager", "1.0");   String collectionConfig =        "" +        "" +        "";   service.createCollection   (collection, DOMParser.toDocument(collectionConfig));

Again, the first thing I need in order to create a collection is another collection to create it in. I use the getCollection method to obtain a reference to the root collection named db. After that I need a CollectionManagement service, so I ask for the proprietary CollectionManager offered by Xindice. While it would be possible to use the generic CollectionManagement service to create a simple collection, Xindice offers more powerful features that are only available using its proprietary service. The createCollection method takes a collection name and XML configuration string as parameters. My configuration string is pretty simply: the only unusual feature is that I ask for compression using GZip. Take a look at the Xindice documentation for all the possible collection configuration options.

To insert data I am going to create a method that accepts two string parameters: ID and data. All documents in a collection must be uniquely identified, much like rows in a database. I am using the ID passed to the method as the unique identifier. However, if you pass a null or empty string as the value of ID to the createResource method Xindice will generate a unique identifier for you using the createId method. My preference is to call the createId method myself, so I can more easily identify the document. For simplicity, my method will just assume the ID passed is valid. Below is the body of the method.

   XMLResource document = (XMLResource)    col.createResource(ID, "XMLResource");   document.setContent(data);   col.storeResource(document);

As you can see, inserting data into Xindice is quite easy. It should be noted that createResource can handle more than just XML data. With Xindice, createResource can also create a BinaryResource. Other XML databases will vary in their support for other types of resources, so if you aren’t using Xindice then it’s imperative to check the documentation for the database you are using.

Extracting the Data
Now that I have a method to insert data into the database, it is time to create one to get it back out. Unlike a relational database, there are two ways we can get data out of Xindice. The first and simplest way is to get an entire resource using its unique identifier. Because I know that I want to get an entire XML document, all I need is a method that accepts the unique identifier as a string and returns a string with the XML data. The method body follows:

     XMLResource document = (XMLResource) col.getResource(ID);     if(document != null)          return (String) document.getContent();     else          return "";

As you can see, the method is very simple. Because I know ahead of time I will be getting an XML document, I can cast the result of getResource to an XMLResource. From there I simply check to see whether or not the result is null then return the appropriate string.

Of course, the above method is only useful if I know the unique identifier and I want the whole document returned. If I want to query an entire collection for a specific subset of data I need to use XPath. Before jumping right into the method, let me show you an example of an XPath query. Assume I put the following document into the database:

                    foo                    bar                    foobar     

If I wanted to write an XPath query to select the second product, I could use the following query string:

     /product[@product_id="2"]

The query result would be an XPath node-set that contains one node for each result found. In this case, the result would be:

               bar     

However, if I change the XPath query string to find all products of type widget than my result would contain more than one node. Below is the modified XPath query string and its result.

     /product[@type="widget"]                foo                    bar                    foobar     

The above was just a basic example of XPath. For a more thorough look at XPath try this article from Top XML.

With the basics of XPath covered let me jump right into writing my query method. For this method I am going to use a similar signature to the one I used to retrieve an entire document. It will accept an XPath query string as a parameter and return a result as an XML string.

     XPathQueryService service = 	 (XPathQueryService) col.getService("XPathQueryService", "1.0");     ResourceSet resultSet = service.query(xpath);     ResourceIterator results = resultSet.getIterator();     String allResults = "";     while (results.hasMoreResources())     {          Resource res = results.nextResource();          allResults += "[" + res.getContent() + "]";     }     return allResults;

Because any given XPath query can return an arbitrary number of results I use the ResourceIterator class to loop through the results. I concatenate each result in a single result string that I return after I am finished looping. The above example queried an entire collection. However, it is also possible to use XPath queries on individual documents. To do that, one would use the queryResource method instead of the query method.

As with selecting data there are two ways to update a document in Xindice. The easy way is to overwrite it. With Xindice, if you attempt to insert a document with an identifier of an existing document it will simply overwrite it with no questions asked. For small XML documents this generally makes the most sense. However, with very large documents it often makes more sense to update only what has changed in the document. In this case you would use the second method of updating data, XUpdate.

Before explaining XUpdate, let me first implement a method call the XUpdate service. My method will take two parameters: a unique identifier and an XUpdate string. The unique identifier represents the document I want to update, while the XUpdate string contains the rules as well as the data to update it with. Here is the method body:

     XUpdateQueryService service = 	 (XUpdateQueryService) col.getService("XUpdateQueryService", "1.0");     service.updateResource(ID, xupdate);

The method is about as straightforward as you get. Interestingly enough, you can also use XUpdate on an entire collection. Like XPath, simply use the update method instead of the resource specific updateResource method. With this kind of power clearly all the work is done in the XUpdate string.

Using XUpdate
Now for a basic example of XUpdate. For more in-depth information on XUpdate check out the specification at www.xmldb.org/xupdate/xupdate-wd.html. An XUpdate XML document is a container of commands to be performed in series. Again, assume I want to update the following XML document.

                     foo                     bar                    foobar     

If I want to remove product_id 2 and change the description of product_id 3 I would use the following XUpdate modification block.

                        bar     

As you can see, my XUpdate modification block is made up of two commands. The first removes product_id 2 and the second updates product_id 3. There are two main points to remember with XUpdate. First, the commands are performed serially (in the order they appear in the modification block). Second, the command is performed on the element represented by the select attribute, which is an XPath query. Thus it is possible to update multiple documents with specific changes in a single XUpdate modification block.

The final method deletes documents from the database.

     Resource document = col.getResource(ID);     col.removeResource(document);

Having seen all of the most important features Xindice offers, you might agree that it is quite a complete native database implementation. Like many open source projects, Xindice continues to evolve. Two of the more interesting features that may appear in the next major release are versioning and auto-linking. Versioning would be quite a useful feature for developers who use Xindice for their XML configuration documents. Auto-linking on the other hand represents an interesting approach to data redundancy in XML documents. It has the potential to provide the power of relations without the rigid schema requirements found in traditional relational databases.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist