devxlogo

XSH: Interactively Manipulate and Analyze XML Data

XSH: Interactively Manipulate and Analyze XML Data

ML is a flexible language for storing hierarchical data along with its associated metadata. Storing XML data enables you to manipulate and analyze it later. When writing applications, you probably use some kind of XSLT engine (Cocoon, for example) in order to pick out pieces of data from the structured XML file and transform them into an output format or process the data in some other way. Usually this means you have to write, compile, and then invoke code in order to test what it does on an XML file. Writing the code also usually requires one or two dozen lines of boilerplate code for proper compilation. This code could be Java (lots of extraneous boilerplate code) or XSLT (a little bit of extraneous boilerplate code). Wouldn’t it be nice to just invoke the query and manipulation commands directly on the XML data without all the overhead?

Enter XSH, an open source command-line XML shell that lets you interactively query and manipulate XML data, simplifying development and testing of XML query and manipulation code. Since XSH is written in Perl, its syntax resembles Perl syntax, a boon if you are already familiar with Perl. XSH even lets you write Perl code and access XML data structures as if they were ordinary Perl variables. In addition to Perl, XSH gives you XSLT capabilities such as XPath querying, enabling you to quickly and succinctly express complex queries with the XPath language.

Install XSH
Perform the following steps to continue:

  1. Download XSH. You need both XML-XSH2-2.0.2.tar.gz (or later) and xmltools-bundle.tar.gz.
  2. Unpack both files (tar xzf $file.tar.gz).
  3. Type “su
  4. Type “cd xmltools-bundle
  5. Type “./install –s
  6. Type “cd ../XML-XSH2-2.0.2
  7. Type “perl Makefile.PL
  8. Type “make
  9. Type “make install
  10. Hit Ctrl-D to exit the su shell and return to your original username.

Test that you have XSH installed by typing xsh –version. You should see the following response:

xsh 0.12 (Revision: 2.2)XML::XSH2::Functions 2.0.2 (Revision: 2.7) 

Quick Test Drive
Briefly explore some XSH features by trying out a few commands on an example file. First download the sample code for this article. Unpack it by typing tar xzf xsh_article_sample_code.tgz. Proceed to type the following:

  1. cd xsh_article_sample_code
  2. xsh
  3. $a := open “example1.xml”
  4. ls $a

You should see the XML code of example1.xml.

Now, type the following:

ls //book[author/last-name="Wilde"]

XSH should respond with the following:

    The Picture of Dorian Gray          Oscar            Wilde        Bantam Classics    1983-01-01                  Chapter I                  [3] The studio was filled with the rich odor of roses, and when the          light summer wind stirred amidst the trees of the garden there came          through the open door the heavy scent of the lilac, or the more          delicate perfume of the pink-flowering thorn.                    ...      

The ellipsis (…) is my abbreviation for the content I omitted from the actual XSH listing. I just issued an XPath query for all books whose authors’ last names are Wilde. I created the example1.xml file to contain exactly five books, one of which is The Picture of Dorian Gray, written by Oscar Wilde. The following XSH command will show all of the books in the example XML file:

ls //book/title

That should yield the following:

All Quiet on the Western FrontA Separate PeaceOliver TwistThe Call of the WildThe Picture of Dorian Gray

To see all authors, issue the following command:

ls //book/author

You can issue any XPath query, meaning you can search by tag name, tag content, attribute values, and anything else specified in the XPath standard. The rich searching capability of XPath is one of the key benefits of storing data in XML format. Try a few (or all) of the following commands to get a flavor for the types of queries you can issue:

  • ls //book[substring-before(publication-date, “-“) > 1996]
  • ls //book[contains(chapters/chapter/title, “Primitive”)]
  • ls //book[contains(title, “Dorian”)]
  • ls //title[starts-with(., “Oliver”)]
  • count(//publisher[contains(., “Books”)])
  • ls //book[contains(publisher, “Books”)]/chapters/chapter

For more information on XPath and how to use it, consult the official specification and an excellent XPath Tutorial. You can follow along with the examples in the left frame of the tutorial since all of them should work in XSH.

Basic Query Commands
As you may have surmised, the ls command in XSH displays a section of the XML tree. The ls command may remind you of the Unix ls command, which also displays a section of a tree?the directory tree of the filesystem. The similarity is not coincidental. The XSH navigation commands are modeled on Unix filesystem navigation commands (see Table 1).

Table 1. Unix Filesystem Navigation Commands

Command

Description

cd

Change the current context node* (current position in the XML tree).

ls

List the XML of the current position (or a specified node).

pwd

Show the location of the current context node.

* A node is a tag, attribute, or textual content inside a tag.

Most of the time you will use the ls command in conjunction with an XPath specifier for querying. The following is an example terminal session using the commands above (The yellow highlighted text shows the commands you must type.):

 [wchao@excalibur xsh_article_sample_code]$ xsh------------------------------------------------------------ xsh - XML Editing Shell version 2.0.2/0.12 (Revision: 2.2)------------------------------------------------------------Copyright (c) 2002 Petr Pajas.This is free software, you may use it and distribute it undereither the GNU GPL Version 2, or under the Perl Artistic License.Using terminal type: Term::ReadLine::PerlHint: Type `help' or `help | less' to get more help.$scratch/> open "example1.xml"parsing example1.xmldone./> cd /books/book[contains(title, "Wild")]/books/book[4]> ls    ...    ...    ...    ...    ...  Found 1 node(s)./books/book[4]> pwd/books/book[4]/books/book[4]> ls chapters/chapter[title="Into the Primitive"]        Into the Primitive                  "Old longings nomadic leap,          Chafing at custom's chain;          Again from its brumal sleep          Wakens the ferine strain."          Buck did not read the newspapers, or he would have known that          trouble was brewing, not alone for himself, but for every tide-          water dog, strong of muscle and with warm, long hair, from Puget          Sound to San Diego.  Because men, groping in the Arctic darkness,          had found a yellow metal, and because steamship and transportation          companies were booming the find, thousands of men were rushing          into the Northland.  These men wanted dogs, and the dogs they          wanted were heavy dogs, with strong muscles by which to toil, and          furry coats to protect them from the frost.              Found 1 node(s)./books/book[4]> 

Manipulating Information
One of the highly useful features of XSH is the ability to change the XML. When XSH loads an XML file, it constructs an in-memory DOM tree that you can modify. Table 2 lists the commonly used manipulation commands.

Table 2. Commonly Used Manipulation Commands

Command

Description

copy

Copy one or more nodes from a source to a destination (both XPath). This copies each source node to the corresponding destination node, where “corresponding” means the destination node in the same position in the parameter list as the source node. This means if you copy nodes A and B before nodes C and D, A will go before C and B will go before D.

xcopy

Cross-copy nodes from a source to a destination. This differs from regular copy because it copies every source node to every destination node, resulting in x * y nodes if there are x source nodes and y destination nodes.

insert

Insert a new node of a given type. You must specify the type, which can be: element, attribute, text, cdata, comment, chunk, or entity_reference.

move

Move nodes from one place to another. This is the same as a copy followed by a remove.

rename

Rename a node.

map

Map an expression or short operation onto a list of nodes.

remove

Remove one or more nodes.

xinsert

Cross-insert nodes to one or more destination nodes. This is the “x” version of insert, analogous in operation to how xcopy differs from copy.

xmove

xcopy followed by remove.

The copy, xcopy, move, xmove, insert, and xinsert commands have a location parameter that specifies where the source nodes go in relation to the destination nodes. Table 3 lists the possible choices for location.

Table 3. Possible Choices for Location Parameters

Location

Description

after

Place source nodes after the destination nodes. Most of the use cases are obvious. If both source and destination nodes are attributes, XSH attaches the source node to the parent element of the destination attribute. If the source attribute is not an attribute, but the destination node is an attribute, then the text of the source attribute is simply appended to the value of the destination attribute.

before

Place source nodes before the destination nodes. The behavior is analogous to the after location, except in the preceding position rather than the following position.

into

Place source nodes into the destination nodes. If the destination nodes are of type element, the source nodes become children of the element (unless the source node is of type attribute, in which case the source node becomes an attribute of the destination node). Otherwise, the value of the destination node gets set to the source node.

append

Append a source node to a destination node. If the destination node is of type element or document, then the source node is added as a child of the destination node. Otherwise, XSH appends the textual content of the source node to the content of the destination node.

prepend

Place a source node before a destination node. Same as append, except in the preceding position rather than the following position. For children, prepend starts from the first child and bumps all the other children forward.

replace

Replace the entire destination node with the source node, except when the destination node is an attribute, in which case only the value of the destination node (the textual content) is replaced with the textual content of the source node.

The insert command lets you insert a new node. Table 4 lists the node types and a description of each.

Table 4. Node Types for Insert Command

Location

Description

element

An element tag (e.g. 2005-01-01 or )

attribute

An attribute hanging off an element (e.g. columns=”2″)

text

Textual content

cdata

A CDATA section containing binary data or other non-textual content

comment

An XML comment

chunk

A chunk of well-formed and valid XML in textual format

entity_reference

An entity reference (e.g. >)

Generally speaking, you are going to use and encounter nodes mostly of type element, attribute, and text. Listing 1 shows examples of all of the commands in Table 2 (except move and xmove) in a single terminal session using XSH. The move and xmove commands are built on copy and remove, so they are self-explanatory. The yellow highlighted text shows the commands you must type. The green highlighted text shows the changes or lines of note.

Listing 1. Commonly Used Manipulation Commands (except move and xmove) in a Single Terminal Session Using XSH

[wchao@excalibur xsh_article_sample_code]$ xsh…$scratch/> open "example1.xml"…/> remove /books/book[4]/chapters/chapter   [title="Into the Primitive"]/content/text()removed 1 node(s)/>ls /books/book[4]/chapters/chapter   [title="Into the Primitive"]        Into the Primitive              Found 1 node(s)./> insert text "Hello there" into    /books/book[4]/chapters/chapter[title="Into the Primitive"]/content/> ls /books/book[4]/chapters/chapter[title="Into the Primitive"]        Into the Primitive        Hello there      Found 1 node(s)./> copy /books/book   [title="All Quiet on the Western Front"]/chapters/chapter[1] after /books/book   [title="The Picture of Dorian Gray"]/chapters/chapter[last()]/> ls /books/book[   title="The Picture of Dorian Gray"]    The Picture of Dorian Gray          Oscar            Wilde        Bantam Classics    1983-01-01                  Chapter I                  …                            Chapter II                  …                            Chapter III                  …                            blah blah 1                  blah blah                    Found 1 node(s)./> xcopy /books/book   [title="A Separate Peace"]/chapters/chapter append /books/book[starts-with(title, "The")]/chapters/> ls /books/book[starts-with(title, "The")]    The Call of the Wild          Jack            London        Aladdin    2003-02-01                  Into the Primitive        Hello there                    The Law of Club and Fang                  …                            The Dominant Primordial Beast                  …                          blah blah 1                  blah blah                      blah blah 2                  blah blah                      blah blah 3                  blah blah                    The Picture of Dorian Gray          Oscar            Wilde        Bantam Classics    1983-01-01                  Chapter I                  …                            Chapter II                  …                            Chapter III                  …                      blah blah 1                  blah blah                          blah blah 1                  blah blah                      blah blah 2                  blah blah                      blah blah 3                  blah blah                Found 2 node(s)./> rename { $_ = "publishing-firm" } //publisher/> ls /      …    Ballatine Books    …        …    Scribner    …        …    Tor Books    …        …    Aladdin    …        …   Bantam Classics    …  Found 1 node(s)./> map {$_ = lc($_)} /books/book/title/text()/> ls /books/book/title<span style="background-color: #90ee90">all quiet on the western front</span><span style="background-color: #90ee90">a separate peace</span><span style="background-color: #90ee90">oliver twist</span><span style="background-color: #90ee90">the call of the wild</span><span style="background-color: #90ee90">the picture of dorian gray</span>/> xinsert chunk "123" before //book/chapters/> ls /      …    123        …    123        …   123        …   123        …   123  Found 1 node(s).

Once you are done manipulating information, you may want to save your new XML tree. Use the save command. If you do not specify any parameters to the save command, XSH will overwrite your old XML file. If you want to save the XML tree to a new XML file, specify the –file parameter, like so:

save --file new_filename.xml

(Save the file now so that you can open it from a known state for the following section on Perl.)

Try different variations on the manipulation commands. The beauty of an interactive tool is that you can make changes and try different operations. The feedback is immediate, so you can quickly figure out how things work and equally quickly achieve the results you want on your data.

Using Perl in XSH
Perl is integrated much better with the recently released XSH2 than with the old XSH1. To invoke Perl code, simply surround it with braces (the “{” and “}” characters). Open the saved file from the previous section by typing:

open "new_filename.xml"

Now try the following command:

foreach my $pubdate in //publication-date { my $year; my $month;my $day; perl { literal($pubdate) =~ /(d{4})-(dd)-(dd)/; $year = $1; $month = $2; $day = $3; }; remove$pubdate/text(); insert chunk "${year}${month}${day}" into $pubdate; }

You unfortunately have to type it all on one line. XSH lets you break things over multiple lines, but only in batch scripts. The above command breaks apart the publication date into three subfields, as follows:

/> ls //books/book/publication-date  1987  03  12  2003  09  30  1998  08  01  2003  02  01  1983  01  01Found 5 node(s). 

Other Uses of XSH
XSH accepts input from any stream, meaning you can use it as a batch-processing tool as well as an interactive shell. Bash and other Unix shells operate both interactively and in batch mode, so the dual mode operation is natural for anyone who uses Unix tools. Although beyond the scope of this article, batch operation of XSH can help you solve problems that would be difficult or convoluted with just XSLT, and since it simply involves feeding commands to XSH from a file instead of standard input, the transition is simple. Just type your commands into a text editor and then feed that text file to XSH.

If you work with a substantial amount of XML, tools for simple and quick query and manipulation of XML trees are essential. Now that you know what XSH does and how it can help you, it should be a valuable tool in your toolbox.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist