devxlogo

Creating an XML Document: The Way of Speed

Creating an XML Document: The Way of Speed

If you search the Web and the various articles in magazines and MSDN you can find a lot of documentation about the fastest way of reading an XML document. Speaking of Microsoft XML Parser release 3.0, one of the most commonly discussed issues is whether use the DOM parser or the SAX one, with an almost universal convention that SAX is faster when you just need to read data, while DOM is the way to go if you need to modify and navigate them. The most obvious metaphor is that SAX is like a read only/forward only cursor for a database, while DOM is a full updateable and navigable cursor. If you’re experienced in using database you already know the differences in performance between the two ways of accessing a database.

But, speaking of writing XML data, I was unable to find references to this performance issue.

So I started experimenting and I’ve found some impressive data and some insights about how creating XML document in the fastest way that are not so clearly documented or understood.

All the samples provided in this document are made using a Windows 2000 Professional notebook, with Visual Basic 6.0 SP5 and MS XML Parser 3.0 SP1 installed.

I started with a simple idea: simulate a long document (we’ll see that the performance improvements starts to become impressive when the number of nodes becomes large) with at least one Father-Child structure and use some different ways to manipulate data.

The VBP project provided allows you to input three parameters: how many nodes are written as fathers (with elements named LEVEL1), how many nodes are written as children (with elements named LEVEL2) and how long the text of the element must be.

First way: the long one

The first option is to use VB to create the XML string, using just the VB & operator, a very slow and somewhat deprecated option.

We all agree that this is the longest running option and, because of the way VB handles strings, it’s very slow. It’s so slow that, because of the time it takes to create XML document this way, I have not tested it, it was way too long!

Second way: the DOM one

DOM is one of the two mains API that you can use when working with XML. It’s very elegant, very powerful and also (as we’ll see) efficient. The code to create a document like the one described above (slightly different from what it’s included in the VBP to improve readability) looks like this:

    Set lDOMDoc = CreateObject(“MSXML2.DOMDocument.3.0”)

    ‘Create root

    Set lDOMRootNode = lDOMDoc.createElement(“ROOT”)

    lDOMDoc.appendChild lDOMRootNode

        ‘ Create node

        Set lDOMLev1Node = lDOMDoc.createElement(“LEVEL1”)

        lDOMRootNode.appendChild lDOMLev1Node

            ‘ Create element

            Set lValueElement = lDOMDoc.createElement(“VALUE11”)

            lDOMLev1Node.appendChild lValueElement

            ‘ Set its value

            lDOMLev1Node.lastChild.Text = “11”

        ‘ Same for other nodes       

        Set lDOMLev2Node = lDOMDoc.createElement(“LEVEL2”)

        lDOMLev1Node.appendChild lDOMLev2Node

            Set lValueElement = lDOMDoc.createElement(“VALUE21”)

            lDOMLev2Node.appendChild lValueElement

            lDOMLev2Node.lastChild.Text = “11”

As you can see, the code is very readable; you create your DOM document, append nodes to it, set their value and so on. Clear, concise and can be easily mapped, for example, with the code that loops through one or more recordsets and be also generalized.

Third way: the SAX one

Even if generically known as a good interface to read data, SAX (at least in the Microsoft XML Parser 3.0) has also a set of interfaces to write data. This is defined using the MXXMLWriter object in the MSXML object model and, if you search for it in the latest MSDN Library, you will find only 19 articles speaking of it.

From a programming point of view, the same code of above using DOM is created using MXXMLWriter with the following code:

    Dim lSAXReader As New SAXXMLReader

    Dim lSAXWriter As New MXXMLWriter

    'We need these variables for typecasting the writer

    Dim lSAXContentHandler As IVBSAXContentHandler

    'That is just a helper

    Dim lSAXAttributes As New SAXAttributes

    'Set handler variables to the writer so it implements the interfaces

    Set lSAXContentHandler = lSAXWriter

    'do not include XML declaration, so that the file is like the others      

    lSAXWriter.omitXMLDeclaration = True

    ' Manually call necessary events to generate the XML file

    lSAXContentHandler.startDocument

             

    lSAXAttributes.Clear

    'Create Root

    lSAXContentHandler.startElement "", "", "ROOT", lSAXAttributes

        'Create node

        lSAXContentHandler.startElement "", "", "LEVEL1", lSAXAttributes

                ' Create and populate elements

                lSAXContentHandler.startElement "", "", "VALUE11", _

lSAXAttributes

                lSAXContentHandler.characters "11"

                lSAXContentHandler.endElement "", "", "VALUE11"

        lSAXContentHandler.startElement "", "", "LEVEL2", lSAXAttributes

       

                lSAXContentHandler.startElement "", "", "VALUE21", _

lSAXAttributes

                lSAXContentHandler.characters "11"

                lSAXContentHandler.endElement "", "", "VALUE21"

        lSAXContentHandler.endElement "", "", "LEVEL2"

        lSAXContentHandler.endElement "", "", "LEVEL1"

    lSAXContentHandler.endElement "", "", "ROOT"

It’s a little more verbose than DOM but is still pretty readable and, with some more work, can be easily generalized to render different data.

Fourth way: The Fast String one

At this point of my exploration I wondered some time about the real need of some kind of object model when creating an XML file. After all what we’re doing is creating a simple ASCII text file, hence maybe all the overhead of keeping in memory a complex structure with nodes and elements is not really needed. Problem is that string manipulation from Visual Basic is really slow, and suddenly I remembered that our guest Francesco Balena some time ago wrote a very performance wise string class for Visual Basic Programmer’s Journal. After some digging around I got the CString.zip file that included the CString class, a class that, using some Windows API, improves what is described in tip bank.

Here is the code that creates a simple XML like that shown in the program picture, the only point to note is the SetBufferSize instruction that allocate the space to store the string. Here is set to a very high value (20 Mb), but the best results are obtained if you declare to a size comparable to that of the document you’re creating (if you know the structure of your document, it is possible to calculate its size).

    Dim lCS As New CString

   

    lCS.SetBufferSize(20000000, True)

    lCS.Append “

        lCS.Append “

            lCS.Append “11

        lCS.Append “

            lCS.Append “11

        lCS.Append ““

    lCS.Append ““

It’s very compact, but also difficult to generalize. Beside that, I’ve not considered all the features (like character escaping or character sets support) that would make creating a well-formed XML file a much more complex task.

The results

At this point, if you’ve read until here, you should be interested in what I discovered about performance, so let see a nice table (sorry, no graphics):

 

Fathers x Children x Size

File Size

DOM

SAX

String Class

Nodes

 

 

 

 

 

10000

100 x 100 x 10

620215

490

290

287

20000

200 x 100 x 10

1240415

950

565

531

50000

500 x 100 x 10

3101015

2430

1390

1301

 

 

 

 

 

 

Nodes (100 chars, 100 children)

 

 

 

 

 

10000

100 x 100 x 100

2420215

530

490

345

20000

200 x 100 x 100

4840415

1045

960

662

50000

500 x 100 x 100

12101015

2665

2261

1712

 

 

 

 

 

 

Nodes (10 chars, 10 children only)

 

 

 

 

 

10000

1000 x 10 x 100

618015

515

315

381

20000

2000 x 10 x 100

1236015

1045

630

752

50000

5000 x 10 x 100

3090015

2645

1543

1877

 

 

 

 

 

 

Nodes (10 chars, 1000 children)

 

 

 

 

 

10000

10 x 1000 x 10

656075

515

290

271

20000

20 x 1000 x 10

1312135

995

565

532

50000

50 x 1000 x 10

3280315

2475

1372

1305

The previous chart must be read as follows:

The first two columns define how the XML file is formed: how many LEVEL1 nodes, how many LEVEL2 and how long they are. The second column is the size (in bytes) of the resulting XML file. The next three columns are the time (in milliseconds, got using the Stopwatch class that Karl Petersen has also included in the String Builder sample project) that has been measured. The measures are the average for 5 runs on a PIII 650 notebook with 256 MB of ram.

What we can get from the previous chart?

First is that, if you just need the fastest way to create an XML document without any need to elaborate it afterwards (like, for example, just write it on disk or pass it to a Web Browser) and, at the same time, let someone else (Microsoft programmers) worry about all the features that makes an XML document well formed it’s best to look at the really sparse documentation from Microsoft and learn MXXMLWriter.

Second is that the DOM is not so slow. Before starting this experiment I believed that using DOM to create really large documents was prone to poor performances, because of the overhead that the DOM Object Model requires to keep in memory all the information that allow you to navigate it. But it’s not so. Considering the elegance of the DOM model and all the technologies that relies on it (like XSLT, validation, XPath and so on) if you do not need the ultimate performance the tradeoff between performances and the usability of DOM tends heavily towards DOM. Beside that it seems (from Microsoft figures related to the actual beta code) that the upcoming MS XML 4.0 parser will greatly improve performances.

Last, but not least, is that if performance is your ultimate goal and the document you’re working with is very large maybe give a try to use a simple string concatenation. This is suitable to the task only for particular documents (for example documents for which you’re sure that no character escaping is needed, like a document containing only numbers) and if you work with language where string operations can be performed really fast, but if you need the ultimate performance just look at the chart.

Conclusions

From the test I’ve made and considering the long list of capabilities of DOM versus the poor documentation of SAX, I believe that using DOM to write documents every time you need it is the way to go. Only if you need the ultimate performances it’s best to use SAX or investigate fast string manipulation.

It would be interesting to make more experiments (like increasing the depth of the nodes making this another variable of the test), the code is free so you can do it. Consider also that the XML document created is really simple so, as is it said, your mileage may vary.

Beside that, you also need to consider what you need to do with the XML document you’ve created. If, for example, you need to apply an XSL stylesheet to it, you’d better use DOM so that you do not spend time reloading the XML file in another DOM document. If you need to write the file on disk and you’re using Visual C++ you can investigate string functions and the file mapping functions to write the file the fastest way possible.

For any comment you can email to: [email protected]

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist