Creating an XML Document: The Way of Speed

Creating an XML Document: The Way of Speed

If you search the Web and the various articles in magazines and MSDN you can find a lot of documentation about the fastest way of reading an XML document. Speaking of Microsoft XML Parser release 3.0, one of the most commonly discussed issues is whether use the DOM parser or the SAX one, with an almost universal convention that SAX is faster when you just need to read data, while DOM is the way to go if you need to modify and navigate them. The most obvious metaphor is that SAX is like a read only/forward only cursor for a database, while DOM is a full updateable and navigable cursor. If you’re experienced in using database you already know the differences in performance between the two ways of accessing a database.

But, speaking of writing XML data, I was unable to find references to this performance issue.

So I started experimenting and I’ve found some impressive data and some insights about how creating XML document in the fastest way that are not so clearly documented or understood.

All the samples provided in this document are made using a Windows 2000 Professional notebook, with Visual Basic 6.0 SP5 and MS XML Parser 3.0 SP1 installed.

I started with a simple idea: simulate a long document (we’ll see that the performance improvements starts to become impressive when the number of nodes becomes large) with at least one Father-Child structure and use some different ways to manipulate data.

The VBP project provided allows you to input three parameters: how many nodes are written as fathers (with elements named LEVEL1), how many nodes are written as children (with elements named LEVEL2) and how long the text of the element must be.

First way: the long one

The first option is to use VB to create the XML string, using just the VB & operator, a very slow and somewhat deprecated option.

We all agree that this is the longest running option and, because of the way VB handles strings, it’s very slow. It’s so slow that, because of the time it takes to create XML document this way, I have not tested it, it was way too long!

Second way: the DOM one

DOM is one of the two mains API that you can use when working with XML. It’s very elegant, very powerful and also (as we’ll see) efficient. The code to create a document like the one described above (slightly different from what it’s included in the VBP to improve readability) looks like this:

    Set lDOMDoc = CreateObject(“MSXML2.DOMDocument.3.0”)

    ‘Create root

    Set lDOMRootNode = lDOMDoc.createElement(“ROOT”)

    lDOMDoc.appendChild lDOMRootNode

        ‘ Create node

        Set lDOMLev1Node = lDOMDoc.createElement(“LEVEL1”)

        lDOMRootNode.appendChild lDOMLev1Node

            ‘ Create element

            Set lValueElement = lDOMDoc.createElement(“VALUE11”)

            lDOMLev1Node.appendChild lValueElement

            ‘ Set its value

            lDOMLev1Node.lastChild.Text = “11”

        ‘ Same for other nodes       

        Set lDOMLev2Node = lDOMDoc.createElement(“LEVEL2”)

        lDOMLev1Node.appendChild lDOMLev2Node

            Set lValueElement = lDOMDoc.createElement(“VALUE21”)

            lDOMLev2Node.appendChild lValueElement

            lDOMLev2Node.lastChild.Text = “11”

As you can see, the code is very readable; you create your DOM document, append nodes to it, set their value and so on. Clear, concise and can be easily mapped, for example, with the code that loops through one or more recordsets and be also generalized.

Third way: the SAX one

Even if generically known as a good interface to read data, SAX (at least in the Microsoft XML Parser 3.0) has also a set of interfaces to write data. This is defined using the MXXMLWriter object in the MSXML object model and, if you search for it in the latest MSDN Library, you will find only 19 articles speaking of it.

From a programming point of view, the same code of above using DOM is created using MXXMLWriter with the following code:

    Dim lSAXReader As New SAXXMLReader

    Dim lSAXWriter As New MXXMLWriter

    'We need these variables for typecasting the writer

    Dim lSAXContentHandler As IVBSAXContentHandler

    'That is just a helper

    Dim lSAXAttributes As New SAXAttributes

    'Set handler variables to the writer so it implements the interfaces

    Set lSAXContentHandler = lSAXWriter

    'do not include XML declaration, so that the file is like the others      

    lSAXWriter.omitXMLDeclaration = True

    ' Manually call necessary events to generate the XML file

    lSAXContentHandler.startDocument

             

    lSAXAttributes.Clear

    'Create Root

    lSAXContentHandler.startElement "", "", "ROOT", lSAXAttributes

        'Create node

        lSAXContentHandler.startElement "", "", "LEVEL1", lSAXAttributes

                ' Create and populate elements

                lSAXContentHandler.startElement "", "", "VALUE11", _

lSAXAttributes

                lSAXContentHandler.characters "11"

                lSAXContentHandler.endElement "", "", "VALUE11"

        lSAXContentHandler.startElement "", "", "LEVEL2", lSAXAttributes

       

                lSAXContentHandler.startElement "", "", "VALUE21", _

lSAXAttributes

                lSAXContentHandler.characters "11"

                lSAXContentHandler.endElement "", "", "VALUE21"

        lSAXContentHandler.endElement "", "", "LEVEL2"

        lSAXContentHandler.endElement "", "", "LEVEL1"

    lSAXContentHandler.endElement "", "", "ROOT"

It’s a little more verbose than DOM but is still pretty readable and, with some more work, can be easily generalized to render different data.

Fourth way: The Fast String one

At this point of my exploration I wondered some time about the real need of some kind of object model when creating an XML file. After all what we’re doing is creating a simple ASCII text file, hence maybe all the overhead of keeping in memory a complex structure with nodes and elements is not really needed. Problem is that string manipulation from Visual Basic is really slow, and suddenly I remembered that our guest Francesco Balena some time ago wrote a very performance wise string class for Visual Basic Programmer’s Journal. After some digging around I got the CString.zip file that included the CString class, a class that, using some Windows API, improves what is described in tip bank.

Here is the code that creates a simple XML like that shown in the program picture, the only point to note is the SetBufferSize instruction that allocate the space to store the string. Here is set to a very high value (20 Mb), but the best results are obtained if you declare to a size comparable to that of the document you’re creating (if you know the structure of your document, it is possible to calculate its size).

    Dim lCS As New CString

   

    lCS.SetBufferSize(20000000, True)

    lCS.Append “

        lCS.Append “

            lCS.Append “11

        lCS.Append “

            lCS.Append “11

        lCS.Append ““

    lCS.Append ““

It’s very compact, but also difficult to generalize. Beside that, I’ve not considered all the features (like character escaping or character sets support) that would make creating a well-formed XML file a much more complex task.

The results

At this point, if you’ve read until here, you should be interested in what I discovered about performance, so let see a nice table (sorry, no graphics):

 

Fathers x Children x Size

File Size

DOM

SAX

String Class

Nodes

 

 

 

 

 

10000

100 x 100 x 10

620215

490

290

287

20000

200 x 100 x 10

1240415

950

565

531

50000

500 x 100 x 10

3101015

2430

1390

1301

 

 

 

 

 

 

Nodes (100 chars, 100 children)

 

 

 

 

 

10000

100 x 100 x 100

2420215

530

490

345

20000

200 x 100 x 100

4840415

1045

960

662

50000

500 x 100 x 100

12101015

2665

2261

1712

 

 

 

 

 

 

Nodes (10 chars, 10 children only)

 

 

 

 

 

10000

1000 x 10 x 100

618015

515

315

381

20000

2000 x 10 x 100

1236015

1045

630

752

50000

5000 x 10 x 100

3090015

2645

1543

1877

 

 

 

 

 

 

Nodes (10 chars, 1000 children)

 

 

 

 

 

10000

10 x 1000 x 10

656075

515

290

271

20000

20 x 1000 x 10

1312135

995

565

532

50000

50 x 1000 x 10

3280315

2475

1372

1305

The previous chart must be read as follows:

The first two columns define how the XML file is formed: how many LEVEL1 nodes, how many LEVEL2 and how long they are. The second column is the size (in bytes) of the resulting XML file. The next three columns are the time (in milliseconds, got using the Stopwatch class that Karl Petersen has also included in the String Builder sample project) that has been measured. The measures are the average for 5 runs on a PIII 650 notebook with 256 MB of ram.

What we can get from the previous chart?

First is that, if you just need the fastest way to create an XML document without any need to elaborate it afterwards (like, for example, just write it on disk or pass it to a Web Browser) and, at the same time, let someone else (Microsoft programmers) worry about all the features that makes an XML document well formed it’s best to look at the really sparse documentation from Microsoft and learn MXXMLWriter.

Second is that the DOM is not so slow. Before starting this experiment I believed that using DOM to create really large documents was prone to poor performances, because of the overhead that the DOM Object Model requires to keep in memory all the information that allow you to navigate it. But it’s not so. Considering the elegance of the DOM model and all the technologies that relies on it (like XSLT, validation, XPath and so on) if you do not need the ultimate performance the tradeoff between performances and the usability of DOM tends heavily towards DOM. Beside that it seems (from Microsoft figures related to the actual beta code) that the upcoming MS XML 4.0 parser will greatly improve performances.

Last, but not least, is that if performance is your ultimate goal and the document you’re working with is very large maybe give a try to use a simple string concatenation. This is suitable to the task only for particular documents (for example documents for which you’re sure that no character escaping is needed, like a document containing only numbers) and if you work with language where string operations can be performed really fast, but if you need the ultimate performance just look at the chart.

Conclusions

From the test I’ve made and considering the long list of capabilities of DOM versus the poor documentation of SAX, I believe that using DOM to write documents every time you need it is the way to go. Only if you need the ultimate performances it’s best to use SAX or investigate fast string manipulation.

It would be interesting to make more experiments (like increasing the depth of the nodes making this another variable of the test), the code is free so you can do it. Consider also that the XML document created is really simple so, as is it said, your mileage may vary.

Beside that, you also need to consider what you need to do with the XML document you’ve created. If, for example, you need to apply an XSL stylesheet to it, you’d better use DOM so that you do not spend time reloading the XML file in another DOM document. If you need to write the file on disk and you’re using Visual C++ you can investigate string functions and the file mapping functions to write the file the fastest way possible.

For any comment you can email to: [email protected]

devx-admin

devx-admin

Share the Post:
5G Innovations

GPU-Accelerated 5G in Japan

NTT DOCOMO, a global telecommunications giant, is set to break new ground in the industry as it prepares to launch a GPU-accelerated 5G network in

AI Ethics

AI Journalism: Balancing Integrity and Innovation

An op-ed, produced using Microsoft’s Bing Chat AI software, recently appeared in the St. Louis Post-Dispatch, discussing the potential concerns surrounding the employment of artificial

Savings Extravaganza

Big Deal Days Extravaganza

The highly awaited Big Deal Days event for October 2023 is nearly here, scheduled for the 10th and 11th. Similar to the previous year, this

5G Innovations

GPU-Accelerated 5G in Japan

NTT DOCOMO, a global telecommunications giant, is set to break new ground in the industry as it prepares to launch a GPU-accelerated 5G network in Japan. This innovative approach will

AI Ethics

AI Journalism: Balancing Integrity and Innovation

An op-ed, produced using Microsoft’s Bing Chat AI software, recently appeared in the St. Louis Post-Dispatch, discussing the potential concerns surrounding the employment of artificial intelligence (AI) in journalism. These

Savings Extravaganza

Big Deal Days Extravaganza

The highly awaited Big Deal Days event for October 2023 is nearly here, scheduled for the 10th and 11th. Similar to the previous year, this autumn sale has already created

Cisco Splunk Deal

Cisco Splunk Deal Sparks Tech Acquisition Frenzy

Cisco’s recent massive purchase of Splunk, an AI-powered cybersecurity firm, for $28 billion signals a potential boost in tech deals after a year of subdued mergers and acquisitions in the

Iran Drone Expansion

Iran’s Jet-Propelled Drone Reshapes Power Balance

Iran has recently unveiled a jet-propelled variant of its Shahed series drone, marking a significant advancement in the nation’s drone technology. The new drone is poised to reshape the regional

Solar Geoengineering

Did the Overshoot Commission Shoot Down Geoengineering?

The Overshoot Commission has recently released a comprehensive report that discusses the controversial topic of Solar Geoengineering, also known as Solar Radiation Modification (SRM). The Commission’s primary objective is to

Remote Learning

Revolutionizing Remote Learning for Success

School districts are preparing to reveal a substantial technological upgrade designed to significantly improve remote learning experiences for both educators and students amid the ongoing pandemic. This major investment, which

Revolutionary SABERS Transforming

SABERS Batteries Transforming Industries

Scientists John Connell and Yi Lin from NASA’s Solid-state Architecture Batteries for Enhanced Rechargeability and Safety (SABERS) project are working on experimental solid-state battery packs that could dramatically change the

Build a Website

How Much Does It Cost to Build a Website?

Are you wondering how much it costs to build a website? The approximated cost is based on several factors, including which add-ons and platforms you choose. For example, a self-hosted

Battery Investments

Battery Startups Attract Billion-Dollar Investments

In recent times, battery startups have experienced a significant boost in investments, with three businesses obtaining over $1 billion in funding within the last month. French company Verkor amassed $2.1

Copilot Revolution

Microsoft Copilot: A Suit of AI Features

Microsoft’s latest offering, Microsoft Copilot, aims to revolutionize the way we interact with technology. By integrating various AI capabilities, this all-in-one tool provides users with an improved experience that not

AI Girlfriend Craze

AI Girlfriend Craze Threatens Relationships

The surge in virtual AI girlfriends’ popularity is playing a role in the escalating issue of loneliness among young males, and this could have serious repercussions for America’s future. A

AIOps Innovations

Senser is Changing AIOps

Senser, an AIOps platform based in Tel Aviv, has introduced its groundbreaking AI-powered observability solution to support developers and operations teams in promptly pinpointing the root causes of service disruptions

Bebop Charging Stations

Check Out The New Bebob Battery Charging Stations

Bebob has introduced new 4- and 8-channel battery charging stations primarily aimed at rental companies, providing a convenient solution for clients with a large quantity of batteries. These wall-mountable and

Malyasian Networks

Malaysia’s Dual 5G Network Growth

On Wednesday, Malaysia’s Prime Minister Anwar Ibrahim announced the country’s plan to implement a dual 5G network strategy. This move is designed to achieve a more equitable incorporation of both

Advanced Drones Race

Pentagon’s Bold Race for Advanced Drones

The Pentagon has recently unveiled its ambitious strategy to acquire thousands of sophisticated drones within the next two years. This decision comes in response to Russia’s rapid utilization of airborne

Important Updates

You Need to See the New Microsoft Updates

Microsoft has recently announced a series of new features and updates across their applications, including Outlook, Microsoft Teams, and SharePoint. These new developments are centered around improving user experience, streamlining

Price Wars

Inside Hyundai and Kia’s Price Wars

South Korean automakers Hyundai and Kia are cutting the prices on a number of their electric vehicles (EVs) in response to growing price competition within the South Korean market. Many

Solar Frenzy Surprises

Solar Subsidy in Germany Causes Frenzy

In a shocking turn of events, the German national KfW bank was forced to discontinue its home solar power subsidy program for charging electric vehicles (EVs) after just one day,

Electric Spare

Electric Cars Ditch Spare Tires for Efficiency

Ira Newlander from West Los Angeles is thinking about trading in his old Ford Explorer for a contemporary hybrid or electric vehicle. However, he has observed that the majority of

Solar Geoengineering Impacts

Unraveling Solar Geoengineering’s Hidden Impacts

As we continue to face the repercussions of climate change, scientists and experts seek innovative ways to mitigate its impacts. Solar geoengineering (SG), a technique involving the distribution of aerosols

Razer Discount

Unbelievable Razer Blade 17 Discount

On September 24, 2023, it was reported that Razer, a popular brand in the premium gaming laptop industry, is offering an exceptional deal on their Razer Blade 17 model. Typically

Innovation Ignition

New Fintech Innovation Ignites Change

The fintech sector continues to attract substantial interest, as demonstrated by a dedicated fintech stage at a recent event featuring panel discussions and informal conversations with industry professionals. The gathering,

Import Easing

Easing Import Rules for Big Tech

India has chosen to ease its proposed restrictions on imports of laptops, tablets, and other IT hardware, allowing manufacturers like Apple Inc., HP Inc., and Dell Technologies Inc. more time