If you search the Web and the various articles in magazines and MSDN you can find a lot of documentation about the fastest way of reading an XML document. Speaking of Microsoft XML Parser release 3.0, one of the most commonly discussed issues is whether use the DOM parser or the SAX one, with an almost universal convention that SAX is faster when you just need to read data, while DOM is the way to go if you need to modify and navigate them. The most obvious metaphor is that SAX is like a read only/forward only cursor for a database, while DOM is a full updateable and navigable cursor. If you’re experienced in using database you already know the differences in performance between the two ways of accessing a database.
But, speaking of writing XML data, I was unable to find references to this performance issue.
So I started experimenting and I’ve found some impressive data and some insights about how creating XML document in the fastest way that are not so clearly documented or understood.
All the samples provided in this document are made using a Windows 2000 Professional notebook, with Visual Basic 6.0 SP5 and MS XML Parser 3.0 SP1 installed.
I started with a simple idea: simulate a long document (we’ll see that the performance improvements starts to become impressive when the number of nodes becomes large) with at least one Father-Child structure and use some different ways to manipulate data.
The VBP project provided allows you to input three parameters: how many nodes are written as fathers (with elements named LEVEL1), how many nodes are written as children (with elements named LEVEL2) and how long the text of the element must be.
First way: the long one
The first option is to use VB to create the XML string, using just the VB & operator, a very slow and somewhat deprecated option.
We all agree that this is the longest running option and, because of the way VB handles strings, it’s very slow. It’s so slow that, because of the time it takes to create XML document this way, I have not tested it, it was way too long!
Second way: the DOM one
DOM is one of the two mains API that you can use when working with XML. It’s very elegant, very powerful and also (as we’ll see) efficient. The code to create a document like the one described above (slightly different from what it’s included in the VBP to improve readability) looks like this:
Set lDOMDoc = CreateObject(“MSXML2.DOMDocument.3.0”)
‘Create root
Set lDOMRootNode = lDOMDoc.createElement(“ROOT”)
lDOMDoc.appendChild lDOMRootNode
‘ Create node
Set lDOMLev1Node = lDOMDoc.createElement(“LEVEL1”)
lDOMRootNode.appendChild lDOMLev1Node
‘ Create element
Set lValueElement = lDOMDoc.createElement(“VALUE11”)
lDOMLev1Node.appendChild lValueElement
‘ Set its value
lDOMLev1Node.lastChild.Text = “11”
‘ Same for other nodes
Set lDOMLev2Node = lDOMDoc.createElement(“LEVEL2”)
lDOMLev1Node.appendChild lDOMLev2Node
Set lValueElement = lDOMDoc.createElement(“VALUE21”)
lDOMLev2Node.appendChild lValueElement
lDOMLev2Node.lastChild.Text = “11”
As you can see, the code is very readable; you create your DOM document, append nodes to it, set their value and so on. Clear, concise and can be easily mapped, for example, with the code that loops through one or more recordsets and be also generalized.
Third way: the SAX one
Even if generically known as a good interface to read data, SAX (at least in the Microsoft XML Parser 3.0) has also a set of interfaces to write data. This is defined using the MXXMLWriter object in the MSXML object model and, if you search for it in the latest MSDN Library, you will find only 19 articles speaking of it.
From a programming point of view, the same code of above using DOM is created using MXXMLWriter with the following code:
Dim lSAXReader As New SAXXMLReader
Dim lSAXWriter As New MXXMLWriter
'We need these variables for typecasting the writer
Dim lSAXContentHandler As IVBSAXContentHandler
'That is just a helper
Dim lSAXAttributes As New SAXAttributes
'Set handler variables to the writer so it implements the interfaces
Set lSAXContentHandler = lSAXWriter
'do not include XML declaration, so that the file is like the others
lSAXWriter.omitXMLDeclaration = True
' Manually call necessary events to generate the XML file
lSAXContentHandler.startDocument
lSAXAttributes.Clear
'Create Root
lSAXContentHandler.startElement "", "", "ROOT", lSAXAttributes
'Create node
lSAXContentHandler.startElement "", "", "LEVEL1", lSAXAttributes
' Create and populate elements
lSAXContentHandler.startElement "", "", "VALUE11", _
lSAXAttributes
lSAXContentHandler.characters "11"
lSAXContentHandler.endElement "", "", "VALUE11"
lSAXContentHandler.startElement "", "", "LEVEL2", lSAXAttributes
lSAXContentHandler.startElement "", "", "VALUE21", _
lSAXAttributes
lSAXContentHandler.characters "11"
lSAXContentHandler.endElement "", "", "VALUE21"
lSAXContentHandler.endElement "", "", "LEVEL2"
lSAXContentHandler.endElement "", "", "LEVEL1"
lSAXContentHandler.endElement "", "", "ROOT"
It’s a little more verbose than DOM but is still pretty readable and, with some more work, can be easily generalized to render different data.
Fourth way: The Fast String one
At this point of my exploration I wondered some time about the real need of some kind of object model when creating an XML file. After all what we’re doing is creating a simple ASCII text file, hence maybe all the overhead of keeping in memory a complex structure with nodes and elements is not really needed. Problem is that string manipulation from Visual Basic is really slow, and suddenly I remembered that our guest Francesco Balena some time ago wrote a very performance wise string class for Visual Basic Programmer’s Journal. After some digging around I got the CString.zip file that included the CString class, a class that, using some Windows API, improves what is described in tip bank.
Here is the code that creates a simple XML like that shown in the program picture, the only point to note is the SetBufferSize instruction that allocate the space to store the string. Here is set to a very high value (20 Mb), but the best results are obtained if you declare to a size comparable to that of the document you’re creating (if you know the structure of your document, it is possible to calculate its size).
Dim lCS As New CString
lCS.SetBufferSize(20000000, True)
lCS.Append ““
lCS.Append ““
lCS.Append “11
lCS.Append ““
lCS.Append “11 “
lCS.Append ““
lCS.Append ““
It’s very compact, but also difficult to generalize. Beside that, I’ve not considered all the features (like character escaping or character sets support) that would make creating a well-formed XML file a much more complex task.
The resultsAt this point, if you’ve read until here, you should be interested in what I discovered about performance, so let see a nice table (sorry, no graphics):
| Fathers x Children x Size | File Size | DOM | SAX | String Class |
Nodes |
|
|
|
|
|
10000 | 100 x 100 x 10 | 620215 | 490 | 290 | 287 |
20000 | 200 x 100 x 10 | 1240415 | 950 | 565 | 531 |
50000 | 500 x 100 x 10 | 3101015 | 2430 | 1390 | 1301 |
|
|
|
|
|
|
Nodes (100 chars, 100 children) |
|
|
|
|
|
10000 | 100 x 100 x 100 | 2420215 | 530 | 490 | 345 |
20000 | 200 x 100 x 100 | 4840415 | 1045 | 960 | 662 |
50000 | 500 x 100 x 100 | 12101015 | 2665 | 2261 | 1712 |
|
|
|
|
|
|
Nodes (10 chars, 10 children only) |
|
|
|
|
|
10000 | 1000 x 10 x 100 | 618015 | 515 | 315 | 381 |
20000 | 2000 x 10 x 100 | 1236015 | 1045 | 630 | 752 |
50000 | 5000 x 10 x 100 | 3090015 | 2645 | 1543 | 1877 |
|
|
|
|
|
|
Nodes (10 chars, 1000 children) |
|
|
|
|
|
10000 | 10 x 1000 x 10 | 656075 | 515 | 290 | 271 |
20000 | 20 x 1000 x 10 | 1312135 | 995 | 565 | 532 |
50000 | 50 x 1000 x 10 | 3280315 | 2475 | 1372 | 1305 |
The previous chart must be read as follows:
The first two columns define how the XML file is formed: how many LEVEL1 nodes, how many LEVEL2 and how long they are. The second column is the size (in bytes) of the resulting XML file. The next three columns are the time (in milliseconds, got using the Stopwatch class that Karl Petersen has also included in the String Builder sample project) that has been measured. The measures are the average for 5 runs on a PIII 650 notebook with 256 MB of ram.
What we can get from the previous chart?
First is that, if you just need the fastest way to create an XML document without any need to elaborate it afterwards (like, for example, just write it on disk or pass it to a Web Browser) and, at the same time, let someone else (Microsoft programmers) worry about all the features that makes an XML document well formed it’s best to look at the really sparse documentation from Microsoft and learn MXXMLWriter.
Second is that the DOM is not so slow. Before starting this experiment I believed that using DOM to create really large documents was prone to poor performances, because of the overhead that the DOM Object Model requires to keep in memory all the information that allow you to navigate it. But it’s not so. Considering the elegance of the DOM model and all the technologies that relies on it (like XSLT, validation, XPath and so on) if you do not need the ultimate performance the tradeoff between performances and the usability of DOM tends heavily towards DOM. Beside that it seems (from Microsoft figures related to the actual beta code) that the upcoming MS XML 4.0 parser will greatly improve performances.
Last, but not least, is that if performance is your ultimate goal and the document you’re working with is very large maybe give a try to use a simple string concatenation. This is suitable to the task only for particular documents (for example documents for which you’re sure that no character escaping is needed, like a document containing only numbers) and if you work with language where string operations can be performed really fast, but if you need the ultimate performance just look at the chart.
ConclusionsFrom the test I’ve made and considering the long list of capabilities of DOM versus the poor documentation of SAX, I believe that using DOM to write documents every time you need it is the way to go. Only if you need the ultimate performances it’s best to use SAX or investigate fast string manipulation.
It would be interesting to make more experiments (like increasing the depth of the nodes making this another variable of the test), the code is free so you can do it. Consider also that the XML document created is really simple so, as is it said, your mileage may vary.
Beside that, you also need to consider what you need to do with the XML document you’ve created. If, for example, you need to apply an XSL stylesheet to it, you’d better use DOM so that you do not spend time reloading the XML file in another DOM document. If you need to write the file on disk and you’re using Visual C++ you can investigate string functions and the file mapping functions to write the file the fastest way possible.
For any comment you can email to: [email protected]