Browse DevX
Sign up for e-mail newsletters from DevX


Build Your Own Lightweight XML DOM Parser : Page 3

Microsoft's MSXML parser is rich in functionality, but in some cases a full-featured parser is too large for resource-limited environments. Don't count XML out yet though; you can write your own lightweight VB XML parser in fewer than 400 lines of code!




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Defining XML Parsing Primitives (cont'd)
Each time the parser reads a tag, it calls the parseTag routine, which creates a new SimpleElement object, obtains the tag name and attributes, and then decides what to do with the new element.

Private Sub parseTag(ByVal s As String) Dim tagName As String Dim se As SimpleElement ' is this a CDATA tag? If isCDATA(s) Then ' add the text contents of this tag ' to the last element on the stack Call addCDATA(s) Else ' get the tag name tagName = getTagName(s) If tagName <> "" Then ' create a new SimpleElement Set se = New SimpleElement se.name = tagName ' get all the attributes for this tag Call getAttributes(se, s) ' is this a close tag? If isCloseTag(s) Then Call PopElement(se) Else ' it's a child tag or root tag Call pushElement(se) End If End If End If End Sub

Parsing the tag name and attributes is relatively simple because almost all XML tags follow consistent patterns. The name always follows the opening "<" character and white space is not allowed in tag names. However, there are a few special tags, such as the XML declaration and processing instructions, which begin with "<?", and DOCTYPE and entity definitions and comments, which begin with "<!". This implementation ignores those tags. CDATA tags also begin with "<!", but the SimpleDOMParser treats these in a special way.

Private Function getTagName(ByVal s As String) _ As String Dim tagName As String Dim i As Integer For i = 1 To Len(s) Select Case Mid$(s, i, 1) Case "<", "/" ' ignore Case ">", "[", " " ' stop parsing getTagName = tagName Exit Function Case "!", "?" ' ignore this tag ' you can add additional checks for the ' xml declaration, DOCTYPE elements, ' and comments getTagName = "" ' this is a CDATA tag Exit Function Case Else tagName = tagName & Mid$(s, i, 1) End Select Next Err.Raise 50000, "SimpleDOMParser.getTagName", _ "The tag " & s & " is malformed." End Function

Similarly, all attributes follow the pattern name="value", where the attribute name is always preceded by white space, and the value is always quoted, although the quote character may be either a single or a double quote. After extracting the tag name, any remaining text within the tag must consist of attributes. To parse the attributes, you can simply search ahead for the next expected character, first the "=" sign, which must appear between the attribute name and its value, and then the single or double quote characters, which delimit the attribute value.

After pushing an element on the stack, the parser next looks for text content using the getTagContent() function, passing the main XML string and the current index position to begin searching. The function simply looks for text that occurs between the current index position and the next opening tag character (<), and, if found, adds that text to the last element on the stack.

Private Function getTagContent(s As String, i As Long) Dim endpos As Long Dim content As String endpos = InStr(i, s, "<") If endpos > 0 And endpos > i Then If endpos - i > 0 Then content = Trim$(Mid$(s, i, (endpos - i))) i = endpos Call setTextContent(content) End If End If End Function

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date