An XQuery Servlet for RESTful Data Services

An XQuery Servlet for RESTful Data Services

any web applications exchange data as XML, but that data is usually stored in and queried from relational databases, CRM, ERP, proprietary repositories, and a hodgepodge of other systems. Unfortunately, the languages most commonly used for creating or processing data on the web were designed neither for processing XML nor for integrating data among multiple heterogeneous sources. These are precisely the tasks for which the XQuery language was designed.

This paper shows how to use XQuery for data integration, and how to expose an XQuery as a RESTful data service using a Java servlet. Listing 1 contains the source code for the servlet. This servlet uses the name and external variables of any XQuery to provide a REST interface to the query and deploys the query.

As an XML-oriented data integration language, XQuery can be used to access XML, relational, and flat file formats such as EDI to create complex XML and HTML results. To deploy a query, a developer saves the query into a designated deployment directory in a secure location accessible to the servlet. Subsequently, developers can invoke any query in this directory using its REST interface, which requires nothing more than an HTTP GET or POST operation using a URL that represents the query and its parameters.

Using XQuery for Data Integration
XML plays a central role in most data-intensive web applications, and XQuery was designed to make it easy to find data in XML and to process and transform XML to create any desired XML structure. XQuery simplifies programming with XML in the same way that SQL simplifies programming with relational data and Java simplifies programming with objects?each language was designed to work with data using a particular data model, and supports the operations that are commonly needed in the given paradigm.

Figure 1. Data Integration Without XQuery: The figure illustrates a typical servlet that gathers data from heterogeneous sources, and then processes the results into a usable form.

In addition, the XQuery language was also designed to simplify data integration. Many web applications need to combine data from various sources, including XML, relational databases, legacy formats, and Web services. Each of these data sources typically has its own API and data model, and sometimes also has its own query language. After writing the code to retrieve the data their applications need from each of these data sources, developers then typically write yet more code to combine the data.

Consider a servlet that combines data from two databases and a Web service ? in the Java world, this typically involves coding to three different APIs, then writing Java code or JSP to combine the results, as shown in Figure 1.

The process illustrated in Figure 1 is much easier in XQuery, which queries data in relational databases and other sources as though that data were stored XML. An XQuery implementation designed for data integration can represent almost any kind of data as XML, either by providing an XML view of the data via middleware, or by physically converting it to XML. Such implementations can be optimized for each data source, freeing the programmer from the idiosyncrasies of each data source. Consider the following query, which joins an XML document to a table in a relational database to create an XML result:

   for $h in doc("holdings.xml")/holdings/entry   for $c in collection("companies")/companies   where $h/userid = "Minollo"      and $c/ticker = $h/stockticker      return                   { $c/companyname }           { $c/annualrevenues }         

The first line of this query accesses an XML document on the file system using the doc() function. The second line addresses a relational table using the collection() function.

Author’s Note: The examples in this article are based on DataDirect XQuery, which uses the collection() function to address relational tables. Unfortunately, at this time there is no standard way to address a relational table from XQuery.

A Java program equivalent to the above query would use JDBC and SQL to access the relational data and an XML API such as DOM, SAX, or StAX to process the XML source and create an XML result. The XQuery version is simpler because it treats both data sources the same way, provides direct support for querying and combining data as XML, and can directly create any desired XML structure. And the query is declarative?rather than specifying the steps needed to create the XML result, the query specifies the desired result and lets the implementation find the best way to implement the query.

The declarative nature of XQuery makes it easier to optimize for a variety of data sources; a good implementation can both transparently generate efficient SQL for relational databases, and also retrieve only the required data from the XML file by telling the parser to ignore other data.

It’s worth exploring how an XQuery is executed for several representative data sources?XML, SQL databases, non-XML file formats, and Web service calls. Note that the range of data sources supported by any given XQuery implementation and the strategies used to execute an XQuery against a given data source vary widely. The next section of this article briefly discusses the strategies used by DataDirect’s XQuery implementation.

Editor’s Note: The author, Jonathan Robie, is the XQuery Technology Lead for DataDirect, a vendor of XQuery products. We have selected this article for publication because we believe it to have objective technical merit.

Efficient XQuery for XML
One of the most important factors in processing XML is to have an efficient representation of the input document that permits incremental processing. Optimization through query rewrites is also important. The syntax of XQuery may look procedural, but a straightforward procedural implementation of XQuery does not perform well, especially for large XML files or complex queries, so good implementations may rewrite a query significantly. Many of these rewrites are commonly used for many languages, including constant folding, elimination of common subexpressions, loop rewrites, and ordering rewrites.

In addition to these rewrites, two techniques known as document projection and streaming can dramatically improve speed and memory usage, especially for very large input documents (see Projecting XML Documents).

Document projection involves examining a query to determine what parts of a document the query needs, and using that information at parse time to ensure that only those parts of the document get constructed when the document is parsed. This obviously saves both memory and time, because building the input document accounts for a significant amount of time, and searching through parts of the document that are never needed also accounts for time. Document streaming involves using each portion of an input document to compute the output for which it is responsible, then discarding that portion of the input while processing the next portion. Document streaming does not generally improve speed, but it dramatically improves memory usage, to the extent that memory usage for many queries is near-linear regardless of the size of the document. Implementations that do not use document projection or streaming may have difficulty processing XML files much larger than about 30 MB. In contrast, implementations that use document projection and streaming may be able to handle more than 30 GB for typical queries, but even queries against small files will execute faster.

Efficient XQuery for SQL Databases
Relational data can be queried efficiently with XQuery by converting the query to SQL, executing the SQL in the database, and returning the results as XML. The quality of the generated SQL can dramatically affect performance. Only the data actually required to compute query results should be returned from the database; rows and columns that are not needed should be discarded. To do this, the implementation must generate maximally selective SQL, taking into account all aspects of the original XQuery that might restrict the data that is actually required. Operations that have a straightforward SQL equivalent should almost always be performed in the database. This is particularly important for joins, sorting, and functionality available in the SQL library (which is particularly helpful when implementing the extensive XQuery library, but some ingenuity is required to account for the differences between SQL functions and XQuery functions).

When creating hierarchical XML structures, some algorithms in the generated SQL generally perform better than others; for example, the sort-merge algorithm has been shown to have very good overall performance. When supporting databases from multiple vendors, it’s tempting to rely on a SQL subset portable among most databases; however, translating XQuery for optimum performance often requires the richer functionality found in modern relational databases, which differs among vendors. Because of this, an implementation can perform much better if it tailors the generated SQL to a particular database vendor. Hints can be provided to give the programmer control over the generated SQL. For more information on SQL generation for XQuery, see the article DataDirect XQuery? 2.0? Performance: Generating SQL.

Result retrieval is also an important factor in overall performance. Obviously, high-quality drivers can significantly improve performance. Because some XML APIs and processing patterns require streaming, an implementation should support incremental retrieval so that the query processor can use the first part of the result when appropriate while later parts are still being computed.

XML Converters for non-XML Formats
Many data integration environments have to cope with data in non-XML formats. Some of these formats, such as comma-delimited files, have a simple structure. Others, including EDI, have complex structures?and there are thousands of EDI formats. Predefined XML converters can convert such data to XML as it is queried, and tools exist for creating custom XML converters; for instance, Stylus Studio supports designing XML converters in a graphical environment. Once generated, such XML is queried in the same manner as any other XML, and the same optimization strategies apply. An XML converter that supports streaming can also support document projection and streaming.

XQuery for Web Service calls
Web services are a useful and common way to expose data from applications as XML. Because both SOAP requests and responses are expressed in XML, XQuery is very useful for generating or processing SOAP messages. If an XQuery implementation allows Web service calls within a query, then a single XQuery can formulate a request and process the result. For example, the following query shows how to create Web service call to an Amazon Web service to obtain a book description identified by an ISBN number:

   declare function local:amazon-listing($isbn)   {            All       Ship              ASIN       { $isbn }       Medium        };       let $loc :=           let $payload := local:amazon-listing("0395518482")   return ws:call($loc, $payload)    

The last line of the preceding query issues thee Web service request, specifying a location and a payload. The function at the beginning of the query creates the XML for the payload. Because Web service requests can have complex structures, and because data needed to formulate a Web service request may come from many sources, XQuery is very useful for creating payloads.

With the underlying basics in hand, it’s time to move on to the main topic?exposing an XQuery with a RESTful interface.

Exposing an XQuery as a RESTful Data Service
The main task of a servlet is to respond to an HTTP request by assembling a response to be returned to the client. You’ve seen how XQuery excels at data integration tasks, treating all data sources as XML and returning XML as the result of any query, making XQuery a natural choice for data integration in servlets that return their results as XML or HTML.

First, I’ll discuss how to expose an XQuery to the client using HTTP GET or HTTP POST operations, and then I’ll discuss the design of the sample Java servlet that deploys the queries.

Calling an XQuery with GET
First, consider how to expose an XQuery so it can be called using HTTP GET. The following code shows an XQuery containing an external variable, called user:

   declare variable $user as xs:string external;           { $user }     {         for $h in collection('HOLDINGS')/holdings         where $h/userid = $user         return                        { xs:string($h/stockticker) }             { xs:string($h/shares) }                 }    

You’d access this query using a URL similar to the one shown below, which specifies the name of the query (in the URL parameter “q”) and a value for the external variable:

Using the servlet described later in this section, a developer can write an XQuery, test it locally, and then deploy it by placing it in a server-side deployment directory, where the query is protected from the outside world. After deployment, HTTP clients can invoke the query and obtain results using the simple URL shown above.

Calling an XQuery with POST
GET requests are fine for simple queries, but when query parameters have complex structure or need to be given XML Schema datatypes, it is generally better to specify parameters in the content of an HTTP POST request?the approach generally used for SOAP web messages. The following example shows an XQuery and an XML message that contains the query parameter. If an HTTP request has content, the servlet attempts to parse it as XML, binding the result to the variable $content:

   portfolio.xquery -- a query with an external variable   declare variable $content as document-node() external;      let $user := string($content/parameters/user)   return            { $user }       {         for $h in collection("HOLDINGS")/HOLDINGS         where $h/USERID = $user         return                        { xs:string($h/STOCKTICKER) }             { xs:string($h/SHARES) }                   }     

To run the XQuery post a message to the URL that specifies the name of the query (in the URL parameter ‘q‘) and carries the parameters as POST data to obtain query results. Here’s an example of the URL and the HTTP POST content:      HTTP CONTENT:   Jonathan

To make such queries work, you need an intermediary to accept the web requests (GET or POST) and run the appropriate query. That’s what the XQuery RESTful servlet does.

Implementing the XQuery RESTful Servlet
The examples shown in the previous two sections illustrate the requirements for a RESTful servlet. Queries must be protected from the outside world, but easily deployed by copying them into a deployment directory that is accessible to the server. Queries can be parameterized using URI parameters and/or the content of a HTTP request. The result of a query is returned to the client as the result of the HTTP request.

This servlet is written in Java and implements the Servlet API. It uses XQJ, a Java API that serves as “the JDBC for XQuery,” to invoke XQueries. To improve performance, the servlet prepares each query and places it in a HashMap the first time a client invokes that query. You can see the complete servlet code in Listing 1. Here’s an outline of the structure of that program.

When the servlet is initialized, the init() method shown below creates an empty HashMap to hold prepared queries, sets indentation properties, and connects to the data sources used on the server.

   public void init() {         xqueryMap = new HashMap();      indentationProperty = new Properties();      indentationProperty.setProperty("indent", "yes");      try {         dataSource = new DDXQDataSource(            new FileInputStream(XQueryServlet.CONFIG_FILE));         connection = dataSource.getConnection();      }      catch (Exception exception) {         System.out.println("Could not initialize DataDirect " +             "XQuery due to an Exeption:");         exception.printStackTrace();      }   }

When the servlet terminates, the destroy() method closes all open connections.

   public void destroy(){      try {         if(connection != null){            connection.close();         }      }       catch(XQException anException){         //just making sure that a close took place.         //no real work to perform on this Exception      }   }

HTTP requests from the client result in calls to doPost(), doGet(), or doPut(), but all these methods delegate to a method called doXQuery(), which actually executes the requested query and creates the result.

The doXQuery method first obtains a prepared query by calling findXQuery(), then creates XQuery external variables with the same names and values as parameters found in the HTTP URI. These variables have type xs:string, but the query can cast them to any desired type. The servlet parses any HTTP content in the request as XML and binds it to the variable $content, which can be used in a query. Finally, doXQuery executes the query and writes the result to the return buffer.

The findXQuery() method takes the name of a requested query as a parameter and returns a prepared query. The findXQuery() method first checks the HashMap to see if this query has already been prepared and is up to date; if so, it simply returns the existing prepared query. Otherwise, it looks for the query in the XQuery deployment directory (which is specified in the WEB.XML file shown in Listing 3), prepares the query, adds it to the HashMap, and returns the prepared query. Here’s the findXQuery method code:

   private XQPreparedExpression findXQuery(String shortName)       throws Exception{         //TODO: is the date needed?      //Date now = new Date();      String xqueryFileName = shortName + XQueryServlet.XQUERY_FILE;      XQueryMapEntry entry = null;               entry = (XQueryMapEntry)xqueryMap.get(shortName);      if (entry != null) {         File xqueryFile = new File(xqueryFileName);         if (entry.getDate().before(new Date(            xqueryFile.lastModified()))) {            // The prepared query is stale - prepare again            entry.setQuery(connection.prepareExpression(               new FileReader(xqueryFileName)));            return entry.getQuery();         }         else {            // The prepared query exists and is up to date            return entry.getQuery();         }      }
Figure 2. Data Integration with an XQuery Servlet: The servlet uses the REST interface to select and parameterize XQuery queries. XQuery can query each data source and integrate results, eliminating the need for many APIs.
else { // This query has not yet been prepared and // added to the map entry = new XQueryMapEntry(); entry.setQuery(connection.prepareExpression( new FileReader(xqueryFileName))); xqueryMap.put(shortName, entry); return entry.getQuery(); } }

Figure 2 shows the revised architecture after implementing the XQuery servlet.

As you’ve seen, it’s not terribly difficult to create an XQuery servlet that implements the Java Servlet API, using XQJ to issue XQueries. But it’s a powerful idea, because developers can develop data services by writing queries in XQuery, testing them, and simply copying them to the deployment directory. The servlet makes deployed queries instantly available to users, providing an HTTP interface determined by the query name and its parameters. This development/deployment simplicity is an extremely productive way to create data services.

Editor’s Note: The author, Jonathan Robie, is the XQuery Technology Lead for DataDirect, a vendor of XQuery products. We have selected this article for publication because we believe it to have objective technical merit.

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist