RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


An XQuery Servlet for RESTful Data Services : Page 2

Find out how to expose XQuery data integration services by invoking them through a Java servlet using a REST interface.

Efficient XQuery for XML
One of the most important factors in processing XML is to have an efficient representation of the input document that permits incremental processing. Optimization through query rewrites is also important. The syntax of XQuery may look procedural, but a straightforward procedural implementation of XQuery does not perform well, especially for large XML files or complex queries, so good implementations may rewrite a query significantly. Many of these rewrites are commonly used for many languages, including constant folding, elimination of common subexpressions, loop rewrites, and ordering rewrites.

In addition to these rewrites, two techniques known as document projection and streaming can dramatically improve speed and memory usage, especially for very large input documents (see Projecting XML Documents).

Document projection involves examining a query to determine what parts of a document the query needs, and using that information at parse time to ensure that only those parts of the document get constructed when the document is parsed. This obviously saves both memory and time, because building the input document accounts for a significant amount of time, and searching through parts of the document that are never needed also accounts for time. Document streaming involves using each portion of an input document to compute the output for which it is responsible, then discarding that portion of the input while processing the next portion. Document streaming does not generally improve speed, but it dramatically improves memory usage, to the extent that memory usage for many queries is near-linear regardless of the size of the document. Implementations that do not use document projection or streaming may have difficulty processing XML files much larger than about 30 MB. In contrast, implementations that use document projection and streaming may be able to handle more than 30 GB for typical queries, but even queries against small files will execute faster.

Efficient XQuery for SQL Databases
Relational data can be queried efficiently with XQuery by converting the query to SQL, executing the SQL in the database, and returning the results as XML. The quality of the generated SQL can dramatically affect performance. Only the data actually required to compute query results should be returned from the database; rows and columns that are not needed should be discarded. To do this, the implementation must generate maximally selective SQL, taking into account all aspects of the original XQuery that might restrict the data that is actually required. Operations that have a straightforward SQL equivalent should almost always be performed in the database. This is particularly important for joins, sorting, and functionality available in the SQL library (which is particularly helpful when implementing the extensive XQuery library, but some ingenuity is required to account for the differences between SQL functions and XQuery functions).

When creating hierarchical XML structures, some algorithms in the generated SQL generally perform better than others; for example, the sort-merge algorithm has been shown to have very good overall performance. When supporting databases from multiple vendors, it's tempting to rely on a SQL subset portable among most databases; however, translating XQuery for optimum performance often requires the richer functionality found in modern relational databases, which differs among vendors. Because of this, an implementation can perform much better if it tailors the generated SQL to a particular database vendor. Hints can be provided to give the programmer control over the generated SQL. For more information on SQL generation for XQuery, see the article DataDirect XQuery™ 2.0™ Performance: Generating SQL.

Result retrieval is also an important factor in overall performance. Obviously, high-quality drivers can significantly improve performance. Because some XML APIs and processing patterns require streaming, an implementation should support incremental retrieval so that the query processor can use the first part of the result when appropriate while later parts are still being computed.

XML Converters for non-XML Formats
Many data integration environments have to cope with data in non-XML formats. Some of these formats, such as comma-delimited files, have a simple structure. Others, including EDI, have complex structures—and there are thousands of EDI formats. Predefined XML converters can convert such data to XML as it is queried, and tools exist for creating custom XML converters; for instance, Stylus Studio supports designing XML converters in a graphical environment. Once generated, such XML is queried in the same manner as any other XML, and the same optimization strategies apply. An XML converter that supports streaming can also support document projection and streaming.

XQuery for Web Service calls
Web services are a useful and common way to expose data from applications as XML. Because both SOAP requests and responses are expressed in XML, XQuery is very useful for generating or processing SOAP messages. If an XQuery implementation allows Web service calls within a query, then a single XQuery can formulate a request and process the result. For example, the following query shows how to create Web service call to an Amazon Web service to obtain a book description identified by an ISBN number:

   declare function local:amazon-listing($isbn)
       <tns:ItemId>{ $isbn }</tns:ItemId>
   let $loc :=
     <location address=
      soapaction="http://soap.amazon.com" />   
   let $payload := local:amazon-listing("0395518482")
   return ws:call($loc, $payload)    
The last line of the preceding query issues thee Web service request, specifying a location and a payload. The function at the beginning of the query creates the XML for the payload. Because Web service requests can have complex structures, and because data needed to formulate a Web service request may come from many sources, XQuery is very useful for creating payloads.

With the underlying basics in hand, it's time to move on to the main topic—exposing an XQuery with a RESTful interface.

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date