ML is a flexible language for storing hierarchical data along with its associated metadata. Storing XML data enables you to manipulate and analyze it later. When writing applications, you probably use some kind of XSLT engine (Cocoon, for example) in order to pick out pieces of data from the structured XML file and transform them into an output format or process the data in some other way. Usually this means you have to write, compile, and then invoke code in order to test what it does on an XML file. Writing the code also usually requires one or two dozen lines of boilerplate code for proper compilation. This code could be Java (lots of extraneous boilerplate code) or XSLT (a little bit of extraneous boilerplate code). Wouldn’t it be nice to just invoke the query and manipulation commands directly on the XML data without all the overhead?
Enter XSH, an open source command-line XML shell that lets you interactively query and manipulate XML data, simplifying development and testing of XML query and manipulation code. Since XSH is written in Perl, its syntax resembles Perl syntax, a boon if you are already familiar with Perl. XSH even lets you write Perl code and access XML data structures as if they were ordinary Perl variables. In addition to Perl, XSH gives you XSLT capabilities such as XPath querying, enabling you to quickly and succinctly express complex queries with the XPath language.
Install XSH
Perform the following steps to continue:
- Download XSH. You need both XML-XSH2-2.0.2.tar.gz (or later) and xmltools-bundle.tar.gz.
- Unpack both files (tar xzf $file.tar.gz).
- Type “su”
- Type “cd xmltools-bundle”
- Type “./install –s”
- Type “cd ../XML-XSH2-2.0.2”
- Type “perl Makefile.PL”
- Type “make”
- Type “make install”
- Hit Ctrl-D to exit the su shell and return to your original username.
Test that you have XSH installed by typing xsh –version. You should see the following response:
xsh 0.12 (Revision: 2.2)XML::XSH2::Functions 2.0.2 (Revision: 2.7)
Quick Test Drive
Briefly explore some XSH features by trying out a few commands on an example file. First download the sample code for this article. Unpack it by typing tar xzf xsh_article_sample_code.tgz. Proceed to type the following:
- cd xsh_article_sample_code
- xsh
- $a := open “example1.xml”
- ls $a
You should see the XML code of example1.xml.
Now, type the following:
ls //book[author/last-name="Wilde"]
XSH should respond with the following:
The Picture of Dorian Gray Oscar Wilde Bantam Classics 1983-01-01 Chapter I [3] The studio was filled with the rich odor of roses, and when the light summer wind stirred amidst the trees of the garden there came through the open door the heavy scent of the lilac, or the more delicate perfume of the pink-flowering thorn. ...
The ellipsis (…) is my abbreviation for the content I omitted from the actual XSH listing. I just issued an XPath query for all books whose authors’ last names are Wilde. I created the example1.xml file to contain exactly five books, one of which is The Picture of Dorian Gray, written by Oscar Wilde. The following XSH command will show all of the books in the example XML file:
ls //book/title
That should yield the following:
All Quiet on the Western Front A Separate Peace Oliver Twist The Call of the Wild The Picture of Dorian Gray
To see all authors, issue the following command:
ls //book/author
You can issue any XPath query, meaning you can search by tag name, tag content, attribute values, and anything else specified in the XPath standard. The rich searching capability of XPath is one of the key benefits of storing data in XML format. Try a few (or all) of the following commands to get a flavor for the types of queries you can issue:
- ls //book[substring-before(publication-date, “-“) > 1996]
- ls //book[contains(chapters/chapter/title, “Primitive”)]
- ls //book[contains(title, “Dorian”)]
- ls //title[starts-with(., “Oliver”)]
- count(//publisher[contains(., “Books”)])
- ls //book[contains(publisher, “Books”)]/chapters/chapter
For more information on XPath and how to use it, consult the official specification and an excellent XPath Tutorial. You can follow along with the examples in the left frame of the tutorial since all of them should work in XSH.
Basic Query Commands
As you may have surmised, the ls command in XSH displays a section of the XML tree. The ls command may remind you of the Unix ls command, which also displays a section of a tree?the directory tree of the filesystem. The similarity is not coincidental. The XSH navigation commands are modeled on Unix filesystem navigation commands (see Table 1).
Table 1. Unix Filesystem Navigation Commands
Command |
Description |
cd |
Change the current context node* (current position in the XML tree). |
ls |
List the XML of the current position (or a specified node). |
pwd |
Show the location of the current context node. |
* A node is a tag, attribute, or textual content inside a tag.
Most of the time you will use the ls command in conjunction with an XPath specifier for querying. The following is an example terminal session using the commands above (The yellow highlighted text shows the commands you must type.):
[[email protected] xsh_article_sample_code]$ xsh------------------------------------------------------------ xsh - XML Editing Shell version 2.0.2/0.12 (Revision: 2.2)------------------------------------------------------------Copyright (c) 2002 Petr Pajas.This is free software, you may use it and distribute it undereither the GNU GPL Version 2, or under the Perl Artistic License.Using terminal type: Term::ReadLine::PerlHint: Type `help' or `help | less' to get more help.$scratch/> open "example1.xml"parsing example1.xmldone./> cd /books/book[contains(title, "Wild")]/books/book[4]> ls ... ... ... ... ... Found 1 node(s)./books/book[4]> pwd/books/book[4]/books/book[4]> ls chapters/chapter[title="Into the Primitive"] Into the Primitive "Old longings nomadic leap, Chafing at custom's chain; Again from its brumal sleep Wakens the ferine strain." Buck did not read the newspapers, or he would have known that trouble was brewing, not alone for himself, but for every tide- water dog, strong of muscle and with warm, long hair, from Puget Sound to San Diego. Because men, groping in the Arctic darkness, had found a yellow metal, and because steamship and transportation companies were booming the find, thousands of men were rushing into the Northland. These men wanted dogs, and the dogs they wanted were heavy dogs, with strong muscles by which to toil, and furry coats to protect them from the frost. Found 1 node(s)./books/book[4]>
Manipulating Information
One of the highly useful features of XSH is the ability to change the XML. When XSH loads an XML file, it constructs an in-memory DOM tree that you can modify. Table 2 lists the commonly used manipulation commands.
Table 2. Commonly Used Manipulation Commands
Command |
Description |
copy |
Copy one or more nodes from a source to a destination (both XPath). This copies each source node to the corresponding destination node, where “corresponding” means the destination node in the same position in the parameter list as the source node. This means if you copy nodes A and B before nodes C and D, A will go before C and B will go before D. |
xcopy |
Cross-copy nodes from a source to a destination. This differs from regular copy because it copies every source node to every destination node, resulting in x * y nodes if there are x source nodes and y destination nodes. |
insert |
Insert a new node of a given type. You must specify the type, which can be: element, attribute, text, cdata, comment, chunk, or entity_reference. |
move |
Move nodes from one place to another. This is the same as a copy followed by a remove. |
rename |
Rename a node. |
map |
Map an expression or short operation onto a list of nodes. |
remove |
Remove one or more nodes. |
xinsert |
Cross-insert nodes to one or more destination nodes. This is the “x” version of insert, analogous in operation to how xcopy differs from copy. |
xmove |
xcopy followed by remove. |
The copy, xcopy, move, xmove, insert, and xinsert commands have a location parameter that specifies where the source nodes go in relation to the destination nodes. Table 3 lists the possible choices for location.
Table 3. Possible Choices for Location Parameters
Location |
Description |
after |
Place source nodes after the destination nodes. Most of the use cases are obvious. If both source and destination nodes are attributes, XSH attaches the source node to the parent element of the destination attribute. If the source attribute is not an attribute, but the destination node is an attribute, then the text of the source attribute is simply appended to the value of the destination attribute. |
before |
Place source nodes before the destination nodes. The behavior is analogous to the after location, except in the preceding position rather than the following position. |
into |
Place source nodes into the destination nodes. If the destination nodes are of type element, the source nodes become children of the element (unless the source node is of type attribute, in which case the source node becomes an attribute of the destination node). Otherwise, the value of the destination node gets set to the source node. |
append |
Append a source node to a destination node. If the destination node is of type element or document, then the source node is added as a child of the destination node. Otherwise, XSH appends the textual content of the source node to the content of the destination node. |
prepend |
Place a source node before a destination node. Same as append, except in the preceding position rather than the following position. For children, prepend starts from the first child and bumps all the other children forward. |
replace |
Replace the entire destination node with the source node, except when the destination node is an attribute, in which case only the value of the destination node (the textual content) is replaced with the textual content of the source node. |
The insert command lets you insert a new node. Table 4 lists the node types and a description of each.
Table 4. Node Types for Insert Command
Location |
Description |
element |
An element tag (e.g. |
attribute |
An attribute hanging off an element (e.g. columns=”2″) |
text |
Textual content |
cdata |
A CDATA section containing binary data or other non-textual content |
comment |
An XML comment |
chunk |
A chunk of well-formed and valid XML in textual format |
entity_reference |
An entity reference (e.g. >) |
Generally speaking, you are going to use and encounter nodes mostly of type element, attribute, and text. Listing 1 shows examples of all of the commands in Table 2 (except move and xmove) in a single terminal session using XSH. The move and xmove commands are built on copy and remove, so they are self-explanatory. The yellow highlighted text shows the commands you must type. The green highlighted text shows the changes or lines of note.
Listing 1. Commonly Used Manipulation Commands (except move and xmove) in a Single Terminal Session Using XSH
[[email protected] xsh_article_sample_code]$ xsh…$scratch/> open "example1.xml"…/> remove /books/book[4]/chapters/chapter [title="Into the Primitive"]/content/text()removed 1 node(s)/>ls /books/book[4]/chapters/chapter [title="Into the Primitive"] Into the Primitive Found 1 node(s)./> insert text "Hello there" into /books/book[4]/chapters/chapter[title="Into the Primitive"]/content/> ls /books/book[4]/chapters/chapter[title="Into the Primitive"] Into the Primitive Hello there Found 1 node(s)./> copy /books/book [title="All Quiet on the Western Front"]/chapters/chapter[1] after /books/book [title="The Picture of Dorian Gray"]/chapters/chapter[last()]/> ls /books/book[ title="The Picture of Dorian Gray"] The Picture of Dorian Gray Oscar Wilde Bantam Classics 1983-01-01 Chapter I … Chapter II … Chapter III … blah blah 1 blah blah Found 1 node(s)./> xcopy /books/book [title="A Separate Peace"]/chapters/chapter append /books/book[starts-with(title, "The")]/chapters/> ls /books/book[starts-with(title, "The")] The Call of the Wild Jack London Aladdin 2003-02-01 Into the Primitive Hello there The Law of Club and Fang … The Dominant Primordial Beast … blah blah 1 blah blah blah blah 2 blah blah blah blah 3 blah blah The Picture of Dorian Gray Oscar Wilde Bantam Classics 1983-01-01 Chapter I … Chapter II … Chapter III … blah blah 1 blah blah blah blah 1 blah blah blah blah 2 blah blah blah blah 3 blah blah Found 2 node(s)./> rename { $_ = "publishing-firm" } //publisher/> ls / … Ballatine Books … … Scribner … … Tor Books … … Aladdin … … Bantam Classics … Found 1 node(s)./> map {$_ = lc($_)} /books/book/title/text()/> ls /books/book/titleall quiet on the western front a separate peace oliver twist the call of the wild the picture of dorian gray /> xinsert chunk "123 " before //book/chapters/> ls / … 123 … 123 … 123 … 123 … 123 Found 1 node(s).
Once you are done manipulating information, you may want to save your new XML tree. Use the save command. If you do not specify any parameters to the save command, XSH will overwrite your old XML file. If you want to save the XML tree to a new XML file, specify the –file parameter, like so:
save --file new_filename.xml
(Save the file now so that you can open it from a known state for the following section on Perl.)
Try different variations on the manipulation commands. The beauty of an interactive tool is that you can make changes and try different operations. The feedback is immediate, so you can quickly figure out how things work and equally quickly achieve the results you want on your data.
Using Perl in XSH
Perl is integrated much better with the recently released XSH2 than with the old XSH1. To invoke Perl code, simply surround it with braces (the “{” and “}” characters). Open the saved file from the previous section by typing:
open "new_filename.xml"
Now try the following command:
foreach my $pubdate in //publication-date { my $year; my $month;my $day; perl { literal($pubdate) =~ /(d{4})-(dd)-(dd)/; $year = $1; $month = $2; $day = $3; }; remove$pubdate/text(); insert chunk "${year} ${month} ${day} " into $pubdate; }
You unfortunately have to type it all on one line. XSH lets you break things over multiple lines, but only in batch scripts. The above command breaks apart the publication date into three subfields, as follows:
/> ls //books/book/publication-date 1987 03 12 2003 09 30 1998 08 01 2003 02 01 1983 01 01 Found 5 node(s).
Other Uses of XSH
XSH accepts input from any stream, meaning you can use it as a batch-processing tool as well as an interactive shell. Bash and other Unix shells operate both interactively and in batch mode, so the dual mode operation is natural for anyone who uses Unix tools. Although beyond the scope of this article, batch operation of XSH can help you solve problems that would be difficult or convoluted with just XSLT, and since it simply involves feeding commands to XSH from a file instead of standard input, the transition is simple. Just type your commands into a text editor and then feed that text file to XSH.
If you work with a substantial amount of XML, tools for simple and quick query and manipulation of XML trees are essential. Now that you know what XSH does and how it can help you, it should be a valuable tool in your toolbox.