The eXist XQuery engine, on the other hand, would return a sequence of element nodes with no containers or closures.
The primary reason, is that while Saxon assumes that the output will be an XML object (and hence needs to have some
kind of containment), XQuery does not make that assumption. Note that you can bypass this wrapper problem by placing
the whole query into an XML container.
<employee_set>{
for $employee in doc("employees.xml")/employees/employee
return $employee}
</employee_set>
For Saxon, the output is similar, but not identical (see
Listing 3).
The expression for $item in $seq can be a little misleading. In essence, as the
for statement iterates through the sequence, what is passed into the $item variable is
in fact an internal pointer to each item in that sequence in turn, rather than a copy of that $item.
In other words, the $item context variable is live in that it is referring to a structure
within an underlying
XML (or related) data model, and that the result is in turn a sequence based upon this context.
For instance, in the expression:
for $employee in doc("employees.xml")/employees/employee
order by $employee/lastname ascending
return $employee
what gets returned is a list of employees ordered by the employee's last name—in essence, the
order by statement creates a virtual sequence of the list reordered by the given criterion:
for $employee in doc("employees.xml")/employees/employee order by $employee/lastname ascending return $employee
where the bold expression represents this virtual sequence.
Obviously, for a complex expression, typing this repeatedly could get tedious. Fortunately, you can use the
let command to establish a temporary variable holding this sequence:
let $sorted-seq := for $employee in doc("employees.xml")/employees/employee
order by $employee/lastname ascending
return $employee
This particular statement may seem somewhat counterintuitive, especially if you assume that
let
can hold only a single scalar value (as is the case with most programming languages). However, if you understand that
the
let statement is intended to hold sequences (and more sophisticated data structures) as well as scalars, then the expression makes more sense. Moreover, because the sequence created still points to specific elements within the XML structure, this re-ordered sequence is essentially just a sequence of pointers, not of whole (possibly huge) XML structures.
One upshot is that you can create a staggered filtering mechanism at surprisingly low cost, processing-wise. For instance, suppose that you wanted to create an expression that would sort the employees by last name and then provide you with records 11 through 20 inclusive of this sorted list. The following code illustrates one approach:
let $employees := doc("employees.xml")/employees/employee
let $sorted-employees := for $employee in $employees
order by $employee/lastname ascending
return $employee
let $paged-employees := subsequence($sorted-employees,10,10)
return $paged-employees
In this case, each let assignment is, in fact, working with a sequence of nodes—the initial set of nodes from
the
employees.xml document, a sorted sequence of the same content, and the results of retrieving a subsequence of this ordered list of employees. Yet in all cases, what's being extracted here are just pointers to elements. This paradigm is very useful in creating efficient queries, because rather than moving around large blocks of XML data (or rearranging an XML database) what is instead being manipulated is simply a list of such pointers, making the operations orders of magnitude faster.
One major caveat with this approach, however, is that this pointer manipulation holds true only so long as what is
returned is a naked result (for example, $employee). Anything that changes the resulting content ends up creating new nodes of information, and the XQuery engine then has to effectively de-reference the nodes, even if the changes are fairly insignificant.
The code listed above, returns exactly the same result (with the bracketed results indicating that the
XQuery expressions be evaluated and the results replaced in the stream) but because the first return statement
creates new content, this is a considerably more expensive—and hence far slower—operation.