RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Making XQuery Control Structures Work for You : Page 2

The XQuery language is the XML analogue of SQL, designed to augment XPath 2.0 by working with sets of values, not just with single scalar values.


The eXist XQuery engine, on the other hand, would return a sequence of element nodes with no containers or closures. The primary reason, is that while Saxon assumes that the output will be an XML object (and hence needs to have some kind of containment), XQuery does not make that assumption. Note that you can bypass this wrapper problem by placing the whole query into an XML container.

for $employee in doc("employees.xml")/employees/employee
return $employee}
For Saxon, the output is similar, but not identical (see Listing 3).

The expression for $item in $seq can be a little misleading. In essence, as the for statement iterates through the sequence, what is passed into the $item variable is in fact an internal pointer to each item in that sequence in turn, rather than a copy of that $item. In other words, the $item context variable is live in that it is referring to a structure within an underlying XML (or related) data model, and that the result is in turn a sequence based upon this context.

For instance, in the expression:

for $employee in doc("employees.xml")/employees/employee 
	order by $employee/lastname ascending
	return $employee
what gets returned is a list of employees ordered by the employee's last name—in essence, the order by statement creates a virtual sequence of the list reordered by the given criterion:

for $employee in doc("employees.xml")/employees/employee order by $employee/lastname ascending return $employee
where the bold expression represents this virtual sequence.

Obviously, for a complex expression, typing this repeatedly could get tedious. Fortunately, you can use the let command to establish a temporary variable holding this sequence:

let $sorted-seq := for $employee in doc("employees.xml")/employees/employee 
	order by $employee/lastname ascending 
	return $employee
This particular statement may seem somewhat counterintuitive, especially if you assume that let can hold only a single scalar value (as is the case with most programming languages). However, if you understand that the let statement is intended to hold sequences (and more sophisticated data structures) as well as scalars, then the expression makes more sense. Moreover, because the sequence created still points to specific elements within the XML structure, this re-ordered sequence is essentially just a sequence of pointers, not of whole (possibly huge) XML structures.

One upshot is that you can create a staggered filtering mechanism at surprisingly low cost, processing-wise. For instance, suppose that you wanted to create an expression that would sort the employees by last name and then provide you with records 11 through 20 inclusive of this sorted list. The following code illustrates one approach:

let $employees :=  doc("employees.xml")/employees/employee
let $sorted-employees := for $employee in $employees 
	order by $employee/lastname ascending 
	return $employee
let $paged-employees := subsequence($sorted-employees,10,10)
return $paged-employees
In this case, each let assignment is, in fact, working with a sequence of nodes—the initial set of nodes from the employees.xml document, a sorted sequence of the same content, and the results of retrieving a subsequence of this ordered list of employees. Yet in all cases, what's being extracted here are just pointers to elements. This paradigm is very useful in creating efficient queries, because rather than moving around large blocks of XML data (or rearranging an XML database) what is instead being manipulated is simply a list of such pointers, making the operations orders of magnitude faster.

One major caveat with this approach, however, is that this pointer manipulation holds true only so long as what is returned is a naked result (for example, $employee). Anything that changes the resulting content ends up creating new nodes of information, and the XQuery engine then has to effectively de-reference the nodes, even if the changes are fairly insignificant.

The code listed above, returns exactly the same result (with the bracketed results indicating that the XQuery expressions be evaluated and the results replaced in the stream) but because the first return statement creates new content, this is a considerably more expensive—and hence far slower—operation.

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date