Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Getting the Most Out of XML with XPath : Page 2

If you struggle sometimes with XML's complex tree structure and syntax, you'll want to get your hands on (and your head around) XPath. XPath makes it as easy to view and query XML data as it is to work with a basic file structure. Find out what this simple tool can do for you.


advertisement
XPath Variations
One thing the observant reader will have noticed from the original example was that the first XPath would have returned more than one book name if there had been more than one book in the list. For example:

<books> <book> <name>Code Generation in Action</name> </book> <book> <name>Generative Programming</name> </book> </books>

will return both the 'Code Generation in Action' and 'Generative Programming' name nodes from the XML when you use this XPath:

/books/book/name

The selected nodes are:


<books> <book> <name>Code Generation in Action</name> </book> <book> <name>Generative Programming</name> </book> </books>

To refine the search to the first or second book you can provide ordinal values.

/books/book[1]/name

The statement above selects the first node:

<books> <book> <name>Code Generation in Action</name> </book> <book> <name>Generative Programming</name> </book> </books>

And this will return the second:

/books/book[2]/name

So what does an XPath statement return? It returns either a list of nodes or a single node, where a node is either a tag element or an attribute. Most APIs have a method for fetching all of the nodes matching an XPath query as a list or returning the first match as a single node. Getting back to the previous example, every element and tag has a unique XPath address. You can think of a fully qualified path to the first book name as:

/books[1]/book[1]/name[1]

Handling Attributes
Tags aren't the only things in XML trees though; you also need to be able to access attributes. Let's say we had this XML input file:

<books> <book name="Code Generation in Action" /> <book name="Generative Programming" /> </books>

To get the names of the books with this schema you need to add the '@' operator into your use of the XPath syntax:

/books/book/@name

This XPath statement will give us the 'name' attribute nodes for each of the books:

<books> <book name=">Code Generation in Action" /> <book name=">Generative Programming" /> </books>

The easy way to remember to use the '@' is just to think of it as 'at,' which is the start of the word 'attribute.' You can get a little more complex by using attributes instead of ordinals to refine your queries. For example, consider the following XML tree:

<books> <publisher name="Manning"> <book name="Code Generation in Action" /> </publisher> <publisher name="Addison-Wesley"> <book name="Generative Programming" /> </publisher> </books>

To get all of the book tags from 'Manning' you can use this XPath:

/books/publisher[@name='Manning']/book

which returns:

<books> <publisher name="Manning"> <book name="Code Generation in Action" /> </publisher> <publisher name="Addison-Wesley"> <book name="Generative Programming" /> </publisher> </books>

To get just the names of the Manning books, refine the XPath a bit more:

/books/publisher[@name='Manning']/book/@name

This returns a smaller node list:

<books> <publisher name="Manning"> <book name="Code Generation in Action" /> </publisher> <publisher name="Addison-Wesley"> <book name="Generative Programming" /> </publisher> </books>

XPath queries that use the bracket notation can be extremely complex, as they use Boolean logic and built-in functions to refine queries. A deep discussion of this syntax is beyond the scope of this article, but these complex queries can be very handy. To learn them, get a copy of Michael Kay's excellent "XSLT Programmer's Reference," which discusses XPath at this level of depth.

Because XML is a tree you may find that the elements that you are interested are scattered about the tree at various levels. Take this XML:

<books> <publisher name="Manning"> <group name="Technical"> <book name="Code Generation in Action" /> </group> </publisher> <publisher name="Addison-Wesley"> <book name="Generative Programming" /> </publisher> </books>

In one case the books are grouped (in the <group> tag "technical") and in the other the books are not. No matter, XPath uses a wildcard syntax to make finding them easy. This XPath query will find all of the book nodes regardless of their location:

//book

Which results in this selection:

<books> <publisher name="Manning"> <group name="Technical"> <book name="Code Generation in Action" /> </group> </publisher> <publisher name="Addison-Wesley"> <book name="Generative Programming" /> </publisher> </books>

The following query will find just the names of the books at any level:

//book/@name

Sometimes you need to get all of the nodes within a particular structure. Take this XML tree:

<media> <author id="herrington" first="Jack" last="Herrington"> <book name="Code Generation in Action" /> <site name="Code Generation Network" /> <article name="PHP Scalability Myth" /> </author> </media>

What happens when you want to get anything written by the author with the id of 'herrington'? In the operating system you would use '*' and the same works in XPath:

/media/author[@id='herrington']/*

This statement returns the three nodes within the author tag:

<media> <author id="herrington" first="Jack" last="Herrington"> <book name="Code Generation in Action" /> <site name="Code Generation Network" /> <article name="PHP Scalability Myth" /> </author> </media>

To get the name attributes, though, you need to make a slight modification:

/media/author[@id='herrington']/*/@name

And that creates a little tighter node set:

<media> <author id="herrington" first="Jack" last="Herrington"> <book name="Code Generation in Action" /> <site name="Code Generation Network" /> <article name="PHP Scalability Myth" /> </author> </media>

What I haven't yet covered is the context for the query. So far all of the XPath we have used starts at the root of the tree. But what happens when you are deep in the file system and you want to just go back a step? In the operating system you can use '..' to specify the previous node. The same works in XPath. When you run an XPath query you give it two things, the first is the starting node (most often the document root), and then the XPath query string.

<media> <author id="herrington" first="Jack" last="Herrington"> <book name="Code Generation in Action" /> <site name="Code Generation Network" /> <article name="PHP Scalability Myth" /> </author> </media>

If you start with the book node in the example above, you can use the '..' notation to go up one level to the author id tag and get the first and last names of the author from the first and last attributes:

../@first

and:

../@last

Now that you know how handy XPath can be the next question is, how do you use it? Thankfully all of the major programming languages have XPath support built into their support for XML.



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap