Query will likely become the dominant language for querying data from most data sources. Although designed for querying XML data, you can use XQuery to tie together data from multiple data sources. In that respect it is much more powerful than SQL, which will slowly but surely be replaced as the main query language.
You may think I’m making a pretty bold statement here considering the current dominance of SQL with its many dialects, and the fact that XQuery has yet to become a W3 Consortium (W3C) Recommendation. However, there are already several implementations based on working drafts, including one by Microsoft included in the .NET Framework. These implementations show the enormous power of XQuery. But what is XQuery exactly?
XML Query, XQuery for short, is a new query language currently under development by the W3C. It is designed to query XML documents using a SQL-like syntax. XQuery’s capabilities go far beyond SQL however, because XML (and thus XQuery) isn’t bound to the rigid structure of tables and relations. XML can represent a large number of data models. Furthermore an XQuery query can return data from multiple documents in different locations. XSLT has similar capabilities, but many IT people will find XQuery much easier to understand, particularly database administrators familiar with SQL.
You can use XQuery to extract an XML document from a physical or virtual representation of XML data. An example of the latter is SQLXML (provided in Microsoft SQL Server 2000), which enables you to extract data from a SQL Server database formatted as XML using the HTTP protocol. Any system that exposes XML over HTTP is a potential source of data for XQuery. XQuery’s designers hope that XQuery can act as a unified query language for any data store, including XML files, XML databases, and non-XML data stores. With the proliferation of loosely coupled systems and data coming from half way across the globe, performance of multi-document queries is going to be an issue, particularly if you only need a small amount of data from a large document. Future versions of XQuery may alleviate this problem by distributing a query over the queried systems.
Although XQuery is still a working draft it already has broad support. There are several applications providing the ability to query using XQuery. Microsoft has already hinted that the next release of SQL Server (codename Yukon) will provide support for XQuery as well, and both IBM and Oracle will likely offer some kind of XQuery support once XQuery attains W3C Recommendation status.
XQuery uses four main keywords to create query expressions: FOR, LET, WHERE, and RETURN. These keywords are commonly used in conjunction to query data and create a result. People familiar with XQuery who build an expression using these keywords refer to this as a FLWR-expression (or FLoWeR-expression). In technical terms, these expressions are element constructors?you use them to construct (sequences of) elements. Let’s start with a simple expression to show you how this works.
FOR $d IN document("menu.xml")//dish RETURN
Note that the expression above is itself not well-formed, and thus is not XML. Most W3C standards that deal with XML are usually specified in XML format (XSLT, for example). XQuery represents a departure from this standard. The W3C choose to abandon the need for XQuery to be XML in favor of a simplified model. When you apply the expression above to menu.xml shown in Listing 1, it yields the following result:
Crab Cakes Jumbo Prawns Ceasar Salad Grilled Salmon Linguini al Pesto Rack of Lamb Dame Blanche Sorbet Banana Split
The above result is a sequence of nodes, and as such is not well-formed XML (it has no root element). Hence both the query and the result need not necessarily be XML. A sequence is ordered, so unlike SQL which is set-based, all data returned has a specific order. Returned data can be in the order in which the nodes appear in the source document (document order), but you can also manipulate the order. So how did the data actually get retrieved?
First, the document() function opens the menu.xml file and the XPath expression //dish retrieves all dish nodes in the source document. Second, the FOR keyword iterates through the sequence and assigns a node to the $d variable with each iteration. The expression uses RETURN to build a result for each of the nodes based on the $d variable. In this case, all the RETURN statement does is create a dish element with the text from the original dish element. The curly braces make sure that the XPath expression is evaluated to give a result instead of just showing it as is in the result. The result is very similar to a SQL SELECT query on a simple table. Before we break out of that similarity, let’s look at how you can refine the query to select only certain elements by adding WHERE to the mix.
FOR $d IN document("menu.xml")//dish WHERE $d/@id>'6' RETURN
When you use the expression above, you actually limit the result to only the desserts from Listing 1, as shown below:
Dame Blanche Sorbet Banana Split
The WHERE clause more or less acts the same as a SQL WHERE clause. You can add more criteria with AND and OR clauses, limiting the selection even further. Note that the id-attribute is actually compared to a string rather than a number. This is because XQuery uses XPath 2.0 (although it still looks like XPath 1.0 in this sample), which is still a little funky in most implementations. In the .NET implementation used here, the implicit conversion from string to number doesn’t work properly yet. Also only a limited number of the XPath 2.0 functions work at this point. If you start experimenting with XQuery, you’re bound to run into situations where your query is correct according to the specification, but still doesn’t work. Just keep in mind that most implementations are experimental and in the technology preview stage. Once the W3C recommends XQuery I expect to see pretty stable beta implementations very quickly, because none of the major vendors want to lag behind.
One problem with the WHERE expression I presented in my example is that it performs its filtering after the data has been selected. With large documents this is clearly inefficient, although in some cases unavoidable. Limiting the selection by changing the XPath expression in the FOR clause should theoretically result in better performance because it limits the number of iterations. The following query yields the same result, but I’ve removed the WHERE clause and I’ve added a filter to the XPath expression:
FOR $d IN document("menu.xml") //dish[$d/@id>'6'] RETURN
You’ll also use XPath filtering when you use the LET keyword. With LET you can assign data to a variable that you can later manipulate. The following (somewhat meaningless) example shows how this works:
LET $x := document("menu.xml")/menu/desserts/dish FOR $d IN $x RETURN
The above query again yields all the desserts from Listing 1, but instead of the FOR expression iterating over data directly pulled from menu.xml, it acts on the $x variable that was filled with the LET expression. Obviously you can use LET in more useful ways than I’ve shown here. It enables you to select data that you can re-use inside an iteration and join/merge that data with another specified data set. The object of the above sample was just to show you how LET operates.
With the FLWR-expressions under your belt you can start to create documents with XQuery instead of just a sequence of nodes. You can do so by embedding one or more queries in XML using curly braces, as shown in Listing 2. Each expression within the curly braces is executed separately. As you can imagine, the result is just a simple HTML page that shows the appetizers, entrees, and desserts from menu.xml separated by a header. You can easily make it more complex by adding formatting outside the FLWR-expressions, and by adding formatting to the HTML built into the RETURN clause of each expression. As long as the RETURN clause is well-formed, you can pretty much add any element and attribute you want.
XQuery in .NET
Microsoft offers XQuery demo classes for the .NET Framework through http://xqueryservices.com. You can either use the online demo with several of the XQuery User Cases (see sidebar), or download the classes and experiment with them yourself. Note that this is just a demo implementation. The namespace and the classes are likely to change for the actual implementation. The overall structure and approach, however, will probably stay the same.
Executing a query consists of two steps: loading the document(s), and executing the query over the loaded documents, as shown in Figure 1. The documents you load are stored in an XQueryNavigatorCollection object. You can either load a file directly into the collection, or load an XQueryDocument using an XmlReader object, and from there use the CreateNavigator() method to create an XQueryNavigator object that you store in the collection. The result is a collection of documents optimized for XQuery. When you execute a query, you feed the collection to the XQueryExpression object executing the query. An advantage of this approach is the ability to work with both physical and virtual documents because the documents are loaded into the collection using an alias. To refer to a document in the query, you refer to the alias. If this wasn’t possible, you could only work on physical documents, because, for instance, there is no way to address data in a database.