Data Selection with XPath
One of the first things a mashup must do is select the relevant data from incoming data pages, which can be done easily with XPath expressions (see Sidebar 2. XPath Built-in to XQuery
). Using XPath expressions, it is easy to extract data from within an XHTML file even if that data is buried under ten levels of <div>
blocks. For example, if you wanted to do a query of all the elements in a web page that are list items (<li>
), you could express this in a single XPath expression:
This notation says start at the root of the file (the first forward slash) and find any list item anywhere regardless of the depth of the file (the double slash). So any web pages that puts relevant data in list items are quickly queried.
The exact XQuery expression for this would be similar to this:
let $mydata := doc("mypage.html")//li
You can also add qualifiers to the XPath expressions to find list items only within specific divs. For example, the following listing will select only the list items from within the main content area of an HTML file:
The square bracket notation (technically called a predicate) is like a SQL WHERE clause. It will return only list items that are nested somewhere under a div that has a class attribute equal to content-main. Note that the double slash at the end of the XPath indicates that the actual list item elements may be nested many layers inside the main content. If you replace the second double slash with a single slash, you will get only list items that are direct descendants of the main content.
XQuery, SQL and XSLT: Birds of a Feather
Developers who are familiar with data selection languages might wonder if all the things they have learned about SQL transfer to XQuery. The answer is "yes, without a doubt." Everything you can do with SQL you can also do with XQuery, including:
- Adding WHERE clauses
- Creating indexes for fast search
- Selecting distinct values
- Restricting selections to the first N items
- Doing joins
- Changing sort order
- Changing grouping
In many cases the syntax is identical, in others only small changes are required. For example, simply adding an Order By clause to an XQuery statement will change the order of the result set. This is exactly the same as SQL.
Those familiar with XSLT will already be familiar with many parts of XQuery. All XPath knowledge also will transfer with almost no changes. The biggest difference I noticed when I ported XSLT mashup code to XQuery is that the large queries ran much faster on large data sets stored in native XML databases. This surprised me at first, but I was reminded that many native XML databases use the same data structures (B+ trees) and indexing schemes large RDBMSs use.
Limitations of Tools and Specifications
The tools being used to perform mashups in XQuery today have many limitations. Although XQuery does have many modern features of advanced functional languages (see Sidebar 3. Programming 100 CPU Cores: Procedural Languages Lacking
), the implementations of individual vendors or libraries may be significantly different. For example, many options are available for HTTP GET
, such as the ability to set timeouts or retry after a given interval, but they must be done on an ad-hoc basis and are not part of the current XQuery specification. Hopefully, these will be part of the XQuery specification or standard XQuery add-on libraries in the future.