Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Build a Lightweight XML DOM Parser with C#

By A. Russell Jones, Executive Editor, DevX.com


advertisement
When you don't need the full capabilities of an XmlDocument object, you can use an XmlTextReader, the SimpleElement class explained by Guang Yang, and a Stack to create the lightweight DOM. This version uses a strongly-typed SimpleElements collection wrapper to hold each SimpleElement's children rather than the LinkedList class used by the Java version.

To parse a document, create a new SimpleDOMParser instance, and call its parse method, passing an XmlTextReader for the file you want to read, for example:

static void Main() { XmlTextReader rdr; SimpleDOMParser sdp; SimpleElement se; try { rdr = new XmlTextReader (@"some xml file path here"); sdp = new SimpleDOMParser(); se = sdp.parse(rdr); rdr.Close(); } catch (Exception ex) { System.Diagnostics.Debug.WriteLine(ex.Message); } }

The basic parser logic follows the same pattern as the Java or VB versions, but because the XmlTextReader already implements all the code needed to parse the XML text, the code is much simpler. The XmlTextReader class functions like a SAX parser in that it reads the document from start to finish, but rather than raise events for each node type as it encounters them, it sets properties that you can query to handle the various types.


public class SimpleDOMParser { private XmlTextReader Reader; private Stack elements; private SimpleElement currentElement; private SimpleElement rootElement; public SimpleDOMParser(){ elements = new Stack(); currentElement = null; } public SimpleElement parse(XmlTextReader reader) { SimpleElement se = null; this.Reader = reader; while (!Reader.EOF) { Reader.Read(); switch (Reader.NodeType) { case XmlNodeType.Element : // create a new SimpleElement se = new SimpleElement(Reader.LocalName); currentElement = se; if (elements.Count == 0) { rootElement = se; elements.Push(se); } else { SimpleElement parent = (SimpleElement) elements.Peek(); parent.ChildElements.Add(se); // don't push empty elements onto // the stack if (Reader.IsEmptyElement) // ends with "/>" { break; } else { elements.Push(se); } } if (Reader.HasAttributes) { while(Reader.MoveToNextAttribute()) { currentElement.setAttribute (Reader.Name,Reader.Value); } } break; case XmlNodeType.Attribute : se.setAttribute(Reader.Name,Reader.Value); break; case XmlNodeType.EndElement : //pop the top element elements.Pop(); break; case XmlNodeType.Text : currentElement.Text=Reader.Value; break; case XmlNodeType.CDATA : currentElement.Text=Reader.Value; break; default : // ignore break; } } return rootElement; }

You use the returned SimpleElement rootElement to iterate through the tree. For example, the following method returns a string containing the indented XML tree.

private static void printTree(SimpleElement se, StringBuilder sb, int depth) { sb.Append(new string('\t',depth) + "<" + se.TagName); foreach (string attName in se.Attributes.Keys) { sb.Append(" " + attName + "=" + "\"" + se.Attribute(attName) + "\""); } sb.Append(">" + se.Text.Trim()); if (se.ChildElements.Count > 0) { sb.Append(System.Environment.NewLine); depth +=1; foreach(SimpleElement ch in se.ChildElements) { //sb.Append(System.Environment.NewLine); printTree (ch, sb, depth); } depth -= 1; sb.Append(new string('\t',depth) + "</" + se.TagName + ">" + System.Environment.NewLine); } else { sb.Append("</" + se.TagName + ">" + System.Environment.NewLine); } }

The SimpleElement class is straightforward and mimics the functionality of the SimpleElement class explained in this article for Java and VB. The SimpleElements collection class is a simple-typed collection wrapper that inherits from CollectionBase. Each SimpleElement exposes Name and Text properties that

As you can see, the XmlTextReader considerably simplifies the code required to build the parser. Like the Java and VB versions, this implementation stores only elements, attributes, text content, and CDATA blocks, but you can easily modify it to handle any XML content you wish.

You can download the C# code here.



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap