Getting the Most Out of XML with XPath

Getting the Most Out of XML with XPath

ML has now taken over the representation and storage of structured data in a textual format. No longer do we even have a need for new text formats because XML and its related standards do such a good job, and are so ubiquitous, that new standards are no longer needed. Understanding and using XML effectively is a critical skill for software engineers.

But even with books on XML and handy software?such as validators (e.g. DTD, RelaxNG), editors (e.g. XMLSpy, oXygen), and translators (i.e. XSLT)?to help us use XML more efficiently, it can still be difficult. This comes down to two major factors, the first is that XML is verbose, which we can do little about, and the second is that XML is often represented in memory as a tree structure, and tree structures are notoriously difficult to navigate.

Thankfully, because XML is so structured you can use an XML-related standard called XPath to quickly locate any information you need in an XML tree.

The Basics of XPath
The ‘path’ in ‘Xpath’ gives us a clue as to how we use this mechanism to specify nodes in an XML tree. Let’s take a very simple piece of XML tree:

            Code Generation in Action    

If you think about the nodes in the tree as a directory it would look like this:

/  books      book         name

The absolute path to the ‘name’ information in a directory structure would be:

/books/book/name

It’s exactly the same in XPath. In fact, the XPath statement above will work properly in XML to return the title of the book:

Author’s Note: Throughout this article, yellow highlight is used to show what information from the original XML would be returned by the Xpath.
            Code Generation in Action    

Of course, there is a lot more to learn about XPath, but fundamentally you can think of an XPath in XML the same way you would think about a path in the operating system.

XPath Variations
One thing the observant reader will have noticed from the original example was that the first XPath would have returned more than one book name if there had been more than one book in the list. For example:

            Code Generation in Action                Generative Programming    

will return both the ‘Code Generation in Action’ and ‘Generative Programming’ name nodes from the XML when you use this XPath:

/books/book/name

The selected nodes are:

            Code Generation in Action                Generative Programming    

To refine the search to the first or second book you can provide ordinal values.

/books/book[1]/name

The statement above selects the first node:

            Code Generation in Action                Generative Programming    

And this will return the second:

/books/book[2]/name

So what does an XPath statement return? It returns either a list of nodes or a single node, where a node is either a tag element or an attribute. Most APIs have a method for fetching all of the nodes matching an XPath query as a list or returning the first match as a single node.Getting back to the previous example, every element and tag has a unique XPath address. You can think of a fully qualified path to the first book name as:

/books[1]/book[1]/name[1]

Handling Attributes
Tags aren’t the only things in XML trees though; you also need to be able to access attributes. Let’s say we had this XML input file:

        

To get the names of the books with this schema you need to add the ‘@’ operator into your use of the XPath syntax:

/books/book/@name

This XPath statement will give us the ‘name’ attribute nodes for each of the books:

    name=">Code Generation in Action" />    name=">Generative Programming" />

The easy way to remember to use the ‘@’ is just to think of it as ‘at,’ which is the start of the word ‘attribute.’ You can get a little more complex by using attributes instead of ordinals to refine your queries. For example, consider the following XML tree:

                              

To get all of the book tags from ‘Manning’ you can use this XPath:

/books/publisher[@name='Manning']/book

which returns:

                              

To get just the names of the Manning books, refine the XPath a bit more:

/books/publisher[@name='Manning']/book/@name

This returns a smaller node list:

           name="Code Generation in Action" />                   

XPath queries that use the bracket notation can be extremely complex, as they use Boolean logic and built-in functions to refine queries. A deep discussion of this syntax is beyond the scope of this article, but these complex queries can be very handy. To learn them, get a copy of Michael Kay’s excellent “XSLT Programmer’s Reference,” which discusses XPath at this level of depth.

Because XML is a tree you may find that the elements that you are interested are scattered about the tree at various levels. Take this XML:

                                               

In one case the books are grouped (in the tag “technical”) and in the other the books are not. No matter, XPath uses a wildcard syntax to make finding them easy. This XPath query will find all of the book nodes regardless of their location:

//book

Which results in this selection:

                                               

The following query will find just the names of the books at any level:

//book/@name

Sometimes you need to get all of the nodes within a particular structure. Take this XML tree:

                         

What happens when you want to get anything written by the author with the id of ‘herrington‘? In the operating system you would use ‘*’ and the same works in XPath:

/media/author[@id='herrington']/*

This statement returns the three nodes within the author tag:

                         

To get the name attributes, though, you need to make a slight modification:

/media/author[@id='herrington']/*/@name

And that creates a little tighter node set:

           name="Code Generation in Action" />       name="Code Generation Network" />       
name="PHP Scalability Myth" />

What I haven’t yet covered is the context for the query. So far all of the XPath we have used starts at the root of the tree. But what happens when you are deep in the file system and you want to just go back a step? In the operating system you can use ‘..’ to specify the previous node. The same works in XPath.When you run an XPath query you give it two things, the first is the starting node (most often the document root), and then the XPath query string.

                     

If you start with the book node in the example above, you can use the ‘..’ notation to go up one level to the author id tag and get the first and last names of the author from the first and last attributes:

../@first 

and:

../@last

Now that you know how handy XPath can be the next question is, how do you use it? Thankfully all of the major programming languages have XPath support built into their support for XML.

XPath Usage Example
My first example of XPath in use is implemented in VB.NET.

            Code Generation in Action                Generative Programming    
Figure 1. Spies Like Us: You can use XMLSpy by Altova Software to select XML nodes using XPath.

Given the XML above, the following VB.NET code would find all of the name nodes and put up a message box with the text of each node.

Dim nodeList As XmlNodeListnodeList = doc.DocumentElement.SelectNodes("/books/book/name")Dim name As XmlNodeFor Each name In nodeList MsgBox(name.InnerText)Next

You can use this code to find just the name of the first book:

Dim node As XmlNodenode = doc.DocumentElement.SelectSingleNode("/books/book/name")MsgBox(node.InnerText)

Similar recipes work for C#:

XmlNodeList nodes = doc.DocumentElement.SelectNodes( "/books/book/name" );foreach( XmlNode node in nodes ){    Console.WriteLine( node.InnerText );}

This code finds all of the name nodes and prints them to the console. Getting a single node looks like this:

XmlNode singleNode = doc.SelectSingleNode( "/books/book/name" );Console.WriteLine( singleNode.InnerText );

In XSLT you use XPath constantly to specify nodes or template-matching criteria.

        

This style sheet prints out each book name as text with line breaks between them. The ‘match’ attribute on the template is XPath and so are the ‘select’ attributes on the for-each and value-of tags.

Scripting languages, such as Ruby, make it very simple to use XPath. Here is how Ruby, through the REXML API, gets all of the name nodes:

doc.each_element( '/books/book/name' ) { |name| print "#{name.text}
" }
Figure 2. Book by Its Cover: Visual XPath makes it easy to experiment with XPath.

In this case you simply send your XPath string to the each_element method on the root node. To get a single node is just as easy:

elem = REXML::XPath.first( doc, '/books/book/name' )print "#{elem.text}
"

Java supports XPath as part of the base J2ME library. C++ can support XPath through Xalan, which sits on top of the Xerces XML parser. Whatever language you choose, if it supports XML it will probably support XPath.

XPath in Applications

Figure 3. XML in the Shell: XMLStartlet lets you do a simple a search on your XML code using a command line interface.

Not only is there support for XPath in programming languages, it’s also in canned applications. Altova’s XMLSpy is an editor for XML with support for searching on XPath built into the interface (see Figure 1).

If you aren’t in the mood to get an XMLSpy license you can still play with XPath interactively using VisualXPath, an open source XPath query analyzer for Windows (see Figure 2).

For Unix environments you can use XPath in a shell by installing XMLStartlet and running xmllint (see Figure 3).

Xmllint turns your XML into a file system and allows you to search it with XPath. It’s an easy way to take a tour around XML you don’t know.

XPath is a great language for two reasons. First, it’s easy to learn. And second, it’s incredibly useful. Learning and using XPath will make it so much easier to process XML code in your applications. It will also open up a gateway to technologies such as XSLT that are fundamental to XML and which make extensive use of XPath.

Author’s Note: In this article I talked about XPath 1.0. There are some tools that support the draft 2.0 spec, but overall it’s not yet well adopted.
devx-admin

devx-admin

Share the Post:
Poland Energy Future

Westinghouse Builds Polish Power Plant

Westinghouse Electric Company and Bechtel have come together to establish a formal partnership in order to design and construct Poland’s inaugural nuclear power plant at

EV Labor Market

EV Industry Hurting For Skilled Labor

The United Auto Workers strike has highlighted the anticipated change towards a future dominated by electric vehicles (EVs), a shift which numerous people think will

Soaring EV Quotas

Soaring EV Quotas Spark Battle Against Time

Automakers are still expected to meet stringent electric vehicle (EV) sales quotas, despite the delayed ban on new petrol and diesel cars. Starting January 2023,

Affordable Electric Revolution

Tesla Rivals Make Bold Moves

Tesla, a name synonymous with EVs, has consistently been at the forefront of the automotive industry’s electric revolution. The products that Elon Musk has developed

Poland Energy Future

Westinghouse Builds Polish Power Plant

Westinghouse Electric Company and Bechtel have come together to establish a formal partnership in order to design and construct Poland’s inaugural nuclear power plant at the Lubiatowo-Kopalino site in Pomerania.

EV Labor Market

EV Industry Hurting For Skilled Labor

The United Auto Workers strike has highlighted the anticipated change towards a future dominated by electric vehicles (EVs), a shift which numerous people think will result in job losses. However,

Soaring EV Quotas

Soaring EV Quotas Spark Battle Against Time

Automakers are still expected to meet stringent electric vehicle (EV) sales quotas, despite the delayed ban on new petrol and diesel cars. Starting January 2023, more than one-fifth of automobiles

Affordable Electric Revolution

Tesla Rivals Make Bold Moves

Tesla, a name synonymous with EVs, has consistently been at the forefront of the automotive industry’s electric revolution. The products that Elon Musk has developed are at the forefront because

Sunsets' Technique

Inside the Climate Battle: Make Sunsets’ Technique

On February 12, 2023, Luke Iseman and Andrew Song from the solar geoengineering firm Make Sunsets showcased their technique for injecting sulfur dioxide (SO₂) into the stratosphere as a means

AI Adherence Prediction

AI Algorithm Predicts Treatment Adherence

Swoop, a prominent consumer health data company, has unveiled a cutting-edge algorithm capable of predicting adherence to treatment in people with Multiple Sclerosis (MS) and other health conditions. Utilizing artificial

Personalized UX

Here’s Why You Need to Use JavaScript and Cookies

In today’s increasingly digital world, websites often rely on JavaScript and cookies to provide users with a more seamless and personalized browsing experience. These key components allow websites to display

Geoengineering Methods

Scientists Dimming the Sun: It’s a Good Thing

Scientists at the University of Bern have been exploring geoengineering methods that could potentially slow down the melting of the West Antarctic ice sheet by reducing sunlight exposure. Among these

why startups succeed

The Top Reasons Why Startups Succeed

Everyone hears the stories. Apple was started in a garage. Musk slept in a rented office space while he was creating PayPal with his brother. Facebook was coded by a

Bold Evolution

Intel’s Bold Comeback

Intel, a leading figure in the semiconductor industry, has underperformed in the stock market over the past five years, with shares dropping by 4% as opposed to the 176% return

Semiconductor market

Semiconductor Slump: Rebound on the Horizon

In recent years, the semiconductor sector has faced a slump due to decreasing PC and smartphone sales, especially in 2022 and 2023. Nonetheless, as 2024 approaches, the industry seems to

Elevated Content Deals

Elevate Your Content Creation with Amazing Deals

The latest Tech Deals cater to creators of different levels and budgets, featuring a variety of computer accessories and tools designed specifically for content creation. Enhance your technological setup with

Learn Web Security

An Easy Way to Learn Web Security

The Web Security Academy has recently introduced new educational courses designed to offer a comprehensible and straightforward journey through the intricate realm of web security. These carefully designed learning courses

Military Drones Revolution

Military Drones: New Mobile Command Centers

The Air Force Special Operations Command (AFSOC) is currently working on a pioneering project that aims to transform MQ-9 Reaper drones into mobile command centers to better manage smaller unmanned

Tech Partnership

US and Vietnam: The Next Tech Leaders?

The US and Vietnam have entered into a series of multi-billion-dollar business deals, marking a significant leap forward in their cooperation in vital sectors like artificial intelligence (AI), semiconductors, and

Huge Savings

Score Massive Savings on Portable Gaming

This week in tech bargains, a well-known firm has considerably reduced the price of its portable gaming device, cutting costs by as much as 20 percent, which matches the lowest

Cloudfare Protection

Unbreakable: Cloudflare One Data Protection Suite

Recently, Cloudflare introduced its One Data Protection Suite, an extensive collection of sophisticated security tools designed to protect data in various environments, including web, private, and SaaS applications. The suite

Drone Revolution

Cool Drone Tech Unveiled at London Event

At the DSEI defense event in London, Israeli defense firms exhibited cutting-edge drone technology featuring vertical-takeoff-and-landing (VTOL) abilities while launching two innovative systems that have already been acquired by clients.

2D Semiconductor Revolution

Disrupting Electronics with 2D Semiconductors

The rapid development in electronic devices has created an increasing demand for advanced semiconductors. While silicon has traditionally been the go-to material for such applications, it suffers from certain limitations.

Cisco Growth

Cisco Cuts Jobs To Optimize Growth

Tech giant Cisco Systems Inc. recently unveiled plans to reduce its workforce in two Californian cities, with the goal of optimizing the company’s cost structure. The company has decided to

FAA Authorization

FAA Approves Drone Deliveries

In a significant development for the US drone industry, drone delivery company Zipline has gained Federal Aviation Administration (FAA) authorization, permitting them to operate drones beyond the visual line of

Mortgage Rate Challenges

Prop-Tech Firms Face Mortgage Rate Challenges

The surge in mortgage rates and a subsequent decrease in home buying have presented challenges for prop-tech firms like Divvy Homes, a rent-to-own start-up company. With a previous valuation of

Lighthouse Updates

Microsoft 365 Lighthouse: Powerful Updates

Microsoft has introduced a new update to Microsoft 365 Lighthouse, which includes support for alerts and notifications. This update is designed to give Managed Service Providers (MSPs) increased control and

Website Lock

Mysterious Website Blockage Sparks Concern

Recently, visitors of a well-known resource website encountered a message blocking their access, resulting in disappointment and frustration among its users. While the reason for this limitation remains uncertain, specialists