Browse DevX
Sign up for e-mail newsletters from DevX


XSH: Interactively Manipulate and Analyze XML Data : Page 2

Most developers use some kind of XSLT engine to pick out and process data from structured XML files. Learn how XSH, an open source command-line XML shell, lets you interactively query and manipulate this data without the coding overhead.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Basic Query Commands
As you may have surmised, the ls command in XSH displays a section of the XML tree. The ls command may remind you of the Unix ls command, which also displays a section of a tree—the directory tree of the filesystem. The similarity is not coincidental. The XSH navigation commands are modeled on Unix filesystem navigation commands (see Table 1).

Table 1. Unix Filesystem Navigation Commands




Change the current context node* (current position in the XML tree).


List the XML of the current position (or a specified node).


Show the location of the current context node.

* A node is a tag, attribute, or textual content inside a tag.

Most of the time you will use the ls command in conjunction with an XPath specifier for querying. The following is an example terminal session using the commands above (The yellow highlighted text shows the commands you must type.):

[wchao@excalibur xsh_article_sample_code]$ xsh ------------------------------------------------------------ xsh - XML Editing Shell version 2.0.2/0.12 (Revision: 2.2) ------------------------------------------------------------ Copyright (c) 2002 Petr Pajas. This is free software, you may use it and distribute it under either the GNU GPL Version 2, or under the Perl Artistic License. Using terminal type: Term::ReadLine::Perl Hint: Type `help' or `help | less' to get more help. $scratch/> open "example1.xml" parsing example1.xml done. /> cd /books/book[contains(title, "Wild")] /books/book[4]> ls <book> <title>...</title> <author>...</author> <publisher>...</publisher> <publication-date>...</publication-date> <chapters>...</chapters> </book> Found 1 node(s). /books/book[4]> pwd /books/book[4] /books/book[4]> ls chapters/chapter[title="Into the Primitive"] <chapter> <title>Into the Primitive</title> <content> "Old longings nomadic leap, Chafing at custom's chain; Again from its brumal sleep Wakens the ferine strain." Buck did not read the newspapers, or he would have known that trouble was brewing, not alone for himself, but for every tide- water dog, strong of muscle and with warm, long hair, from Puget Sound to San Diego. Because men, groping in the Arctic darkness, had found a yellow metal, and because steamship and transportation companies were booming the find, thousands of men were rushing into the Northland. These men wanted dogs, and the dogs they wanted were heavy dogs, with strong muscles by which to toil, and furry coats to protect them from the frost. </content> </chapter> Found 1 node(s). /books/book[4]>

Manipulating Information
One of the highly useful features of XSH is the ability to change the XML. When XSH loads an XML file, it constructs an in-memory DOM tree that you can modify. Table 2 lists the commonly used manipulation commands.

Table 2. Commonly Used Manipulation Commands




Copy one or more nodes from a source to a destination (both XPath). This copies each source node to the corresponding destination node, where "corresponding" means the destination node in the same position in the parameter list as the source node. This means if you copy nodes A and B before nodes C and D, A will go before C and B will go before D.


Cross-copy nodes from a source to a destination. This differs from regular copy because it copies every source node to every destination node, resulting in x * y nodes if there are x source nodes and y destination nodes.


Insert a new node of a given type. You must specify the type, which can be: element, attribute, text, cdata, comment, chunk, or entity_reference.


Move nodes from one place to another. This is the same as a copy followed by a remove.


Rename a node.


Map an expression or short operation onto a list of nodes.


Remove one or more nodes.


Cross-insert nodes to one or more destination nodes. This is the "x" version of insert, analogous in operation to how xcopy differs from copy.


xcopy followed by remove.

The copy, xcopy, move, xmove, insert, and xinsert commands have a location parameter that specifies where the source nodes go in relation to the destination nodes. Table 3 lists the possible choices for location.

Table 3. Possible Choices for Location Parameters




Place source nodes after the destination nodes. Most of the use cases are obvious. If both source and destination nodes are attributes, XSH attaches the source node to the parent element of the destination attribute. If the source attribute is not an attribute, but the destination node is an attribute, then the text of the source attribute is simply appended to the value of the destination attribute.


Place source nodes before the destination nodes. The behavior is analogous to the after location, except in the preceding position rather than the following position.


Place source nodes into the destination nodes. If the destination nodes are of type element, the source nodes become children of the element (unless the source node is of type attribute, in which case the source node becomes an attribute of the destination node). Otherwise, the value of the destination node gets set to the source node.


Append a source node to a destination node. If the destination node is of type element or document, then the source node is added as a child of the destination node. Otherwise, XSH appends the textual content of the source node to the content of the destination node.


Place a source node before a destination node. Same as append, except in the preceding position rather than the following position. For children, prepend starts from the first child and bumps all the other children forward.


Replace the entire destination node with the source node, except when the destination node is an attribute, in which case only the value of the destination node (the textual content) is replaced with the textual content of the source node.

The insert command lets you insert a new node. Table 4 lists the node types and a description of each.

Table 4. Node Types for Insert Command




An element tag (e.g. <publication-date>2005-01-01</publication-date> or <paid/>)


An attribute hanging off an element (e.g. columns="2")


Textual content


A CDATA section containing binary data or other non-textual content


An XML comment


A chunk of well-formed and valid XML in textual format


An entity reference (e.g. >)

Generally speaking, you are going to use and encounter nodes mostly of type element, attribute, and text. Listing 1 shows examples of all of the commands in Table 2 (except move and xmove) in a single terminal session using XSH. The move and xmove commands are built on copy and remove, so they are self-explanatory. The yellow highlighted text shows the commands you must type. The green highlighted text shows the changes or lines of note.

Listing 1. Commonly Used Manipulation Commands (except move and xmove) in a Single Terminal Session Using XSH

[wchao@excalibur xsh_article_sample_code]$ xsh … $scratch/> open "example1.xml" … /> remove /books/book[4]/chapters/chapter [title="Into the Primitive"]/content/text() removed 1 node(s) />ls /books/book[4]/chapters/chapter [title="Into the Primitive"] <chapter> <title>Into the Primitive</title> <content/> </chapter> Found 1 node(s). /> insert text "Hello there" into /books/book[4]/chapters/chapter[title="Into the Primitive"]/content /> ls /books/book[4]/chapters/chapter[title="Into the Primitive"] <chapter> <title>Into the Primitive</title> <content>Hello there</content> </chapter> Found 1 node(s). /> copy /books/book [title="All Quiet on the Western Front"]/chapters/chapter[1] after /books/book [title="The Picture of Dorian Gray"]/chapters/chapter[last()] /> ls /books/book[ title="The Picture of Dorian Gray"] <book> <title>The Picture of Dorian Gray</title> <author> <first-name>Oscar</first-name> <middle-name/> <last-name>Wilde</last-name> </author> <publisher>Bantam Classics</publisher> <publication-date>1983-01-01</publication-date> <chapters> <chapter> <title>Chapter I</title> <content> … </content> </chapter> <chapter> <title>Chapter II</title> <content> … </content> </chapter> <chapter> <title>Chapter III</title> <content> … </content> </chapter> <chapter> <title>blah blah 1</title> <content> blah blah </content> </chapter> </chapters> </book> Found 1 node(s). /> xcopy /books/book [title="A Separate Peace"]/chapters/chapter append /books/book[starts-with(title, "The")] /chapters /> ls /books/book[starts-with(title, "The")] <book> <title>The Call of the Wild</title> <author> <first-name>Jack</first-name> <middle-name/> <last-name>London</last-name> </author> <publisher>Aladdin</publisher> <publication-date>2003-02-01</publication-date> <chapters> <chapter> <title>Into the Primitive</title> <content>Hello there</content> </chapter> <chapter> <title>The Law of Club and Fang</title> <content> … </content> </chapter> <chapter> <title>The Dominant Primordial Beast</title> <content> … </content> </chapter> <chapter> <title>blah blah 1</title> <content> blah blah </content> </chapter><chapter> <title>blah blah 2</title> <content> blah blah </content> </chapter><chapter> <title>blah blah 3</title> <content> blah blah </content> </chapter></chapters> </book> <book> <title>The Picture of Dorian Gray</title> <author> <first-name>Oscar</first-name> <middle-name/> <last-name>Wilde</last-name> </author> <publisher>Bantam Classics</publisher> <publication-date>1983-01-01</publication-date> <chapters> <chapter> <title>Chapter I</title> <content> … </content> </chapter> <chapter> <title>Chapter II</title> <content> … </content> </chapter> <chapter> <title>Chapter III</title> <content> … </content> </chapter><chapter> <title>blah blah 1</title> <content> blah blah </content> </chapter> <chapter> <title>blah blah 1</title> <content> blah blah </content> </chapter><chapter> <title>blah blah 2</title> <content> blah blah </content> </chapter><chapter> <title>blah blah 3</title> <content> blah blah </content> </chapter></chapters> </book> Found 2 node(s). /> rename { $_ = "publishing-firm" } //publisher /> ls / <?xml version="1.0"?> <books> <book> … <publishing-firm>Ballatine Books</publishing-firm> … </book> <book> … <publishing-firm>Scribner</publishing-firm> … </book> <book> … <publishing-firm>Tor Books</publishing-firm> … </book> <book> … <publishing-firm>Aladdin</publishing-firm> … </book> <book> … <publishing-firm>Bantam Classics</publishing-firm> … </book> </books> Found 1 node(s). /> map {$_ = lc($_)} /books/book/title/text() /> ls /books/book/title <title>all quiet on the western front</title> <title>a separate peace</title> <title>oliver twist</title> <title>the call of the wild</title> <title>the picture of dorian gray</title> /> xinsert chunk "<pages>123</pages>" before //book/chapters /> ls / <?xml version="1.0"?> <books> <book> … <pages>123</pages><chapters> </book> <book> … <pages>123</pages><chapters> </book> <book> … <pages>123</pages><chapters> </book> <book> … <pages>123</pages><chapters> </book> <book> … <pages>123</pages><chapters> </book> </books> Found 1 node(s).

Once you are done manipulating information, you may want to save your new XML tree. Use the save command. If you do not specify any parameters to the save command, XSH will overwrite your old XML file. If you want to save the XML tree to a new XML file, specify the --file parameter, like so:

save --file new_filename.xml

(Save the file now so that you can open it from a known state for the following section on Perl.)

Try different variations on the manipulation commands. The beauty of an interactive tool is that you can make changes and try different operations. The feedback is immediate, so you can quickly figure out how things work and equally quickly achieve the results you want on your data.

Thanks for your registration, follow us on our social networks to keep up-to-date