Build a Lightweight XML DOM Parser with C#

Build a Lightweight XML DOM Parser with C#

When you don’t need the full capabilities of an XmlDocument object, you can use an XmlTextReader, the SimpleElement class explained by Guang Yang, and a Stack to create the lightweight DOM. This version uses a strongly-typed SimpleElements collection wrapper to hold each SimpleElement’s children rather than the LinkedList class used by the Java version.

To parse a document, create a new SimpleDOMParser instance, and call its parse method, passing an XmlTextReader for the file you want to read, for example:

static void Main(){   XmlTextReader rdr;   SimpleDOMParser sdp;   SimpleElement se;   try {      rdr = new XmlTextReader       (@"some xml file path here");      sdp = new SimpleDOMParser();      se = sdp.parse(rdr);      rdr.Close();   }   catch (Exception ex) {      System.Diagnostics.Debug.WriteLine(ex.Message);   }}

The basic parser logic follows the same pattern as the Java or VB versions, but because the XmlTextReader already implements all the code needed to parse the XML text, the code is much simpler. The XmlTextReader class functions like a SAX parser in that it reads the document from start to finish, but rather than raise events for each node type as it encounters them, it sets properties that you can query to handle the various types.

public class SimpleDOMParser {   private XmlTextReader Reader;   private Stack elements;   private SimpleElement currentElement;   private SimpleElement rootElement;   public SimpleDOMParser(){      elements = new Stack();      currentElement = null;   }   public SimpleElement parse(XmlTextReader reader) {      SimpleElement se = null;      this.Reader = reader;      while (!Reader.EOF) {      Reader.Read();         switch (Reader.NodeType) {         case XmlNodeType.Element :            // create a new SimpleElement            se = new SimpleElement(Reader.LocalName);            currentElement = se;                  if (elements.Count == 0) {               rootElement = se;               elements.Push(se);            }            else {                     SimpleElement parent = (SimpleElement)                   elements.Peek();               parent.ChildElements.Add(se);               // don't push empty elements onto                // the stack               if (Reader.IsEmptyElement)                   // ends with "/>" {                  break;               }               else {                  elements.Push(se);               }            }            if (Reader.HasAttributes) {               while(Reader.MoveToNextAttribute()) {                  currentElement.setAttribute                      (Reader.Name,Reader.Value);               }            }            break;         case XmlNodeType.Attribute :            se.setAttribute(Reader.Name,Reader.Value);            break;         case XmlNodeType.EndElement :            //pop the top element             elements.Pop();            break;         case XmlNodeType.Text :            currentElement.Text=Reader.Value;            break;         case XmlNodeType.CDATA :            currentElement.Text=Reader.Value;            break;         default :            // ignore            break;      }   }   return rootElement;}

You use the returned SimpleElement rootElement to iterate through the tree. For example, the following method returns a string containing the indented XML tree.

private static void printTree(SimpleElement se,    StringBuilder sb, int depth) {   sb.Append(new string('	',depth) +        "" + se.Text.Trim());   if (se.ChildElements.Count > 0) {      sb.Append(System.Environment.NewLine);      depth +=1;      foreach(SimpleElement ch in se.ChildElements)       {         //sb.Append(System.Environment.NewLine);         printTree (ch, sb, depth);                  }      depth -= 1;      sb.Append(new string('	',depth) +          "" + se.TagName +          ">" + System.Environment.NewLine);   }       else {      sb.Append("" + se.TagName + ">" +          System.Environment.NewLine);   }}

The SimpleElement class is straightforward and mimics the functionality of the SimpleElement class explained in this article for Java and VB. The SimpleElements collection class is a simple-typed collection wrapper that inherits from CollectionBase. Each SimpleElement exposes Name and Text properties that

As you can see, the XmlTextReader considerably simplifies the code required to build the parser. Like the Java and VB versions, this implementation stores only elements, attributes, text content, and CDATA blocks, but you can easily modify it to handle any XML content you wish.

You can download the C# code here.


Share the Post: