Home » Is XML too big? Does anyone care?

Is XML too big? Does anyone care?

The Extensible Markup Language, or XML, is big. But is it too big? And if so, should we do anything about it?

The World Wide Web Consortium says that XML “is a simple, very flexible text format,” but in reality, non-trivial XML documents can be quite complex. Parsing an XML document takes a lot of code and a lot of CPU horsepower — it’s actually more difficult to parse a large document than to create one.

If an XML document is damaged or malformed, software can become very confused, and often, even trivial errors or corruption in the XML document can stop processing. Working with schema extensions can be difficult, and older documents written using DTDs (Document Type Definitions) and Document Object Models (DOMs) can be incomprehensible.

XML, however, is crucial to exchange data, such as documents. Modern file formats, such as Microsoft’s DOCX and XLSX, are XML-based updates of the old Microsoft Word and Excel spreadsheet formats. Similarly, the Open Document Format used by the non-Microsoft world is also an XML-based format.

Still, XML is complex — hard to understand, difficult to validate, requiring extensive resources for parsing and creating documents. That has led to suggestions for a simplified version of the spec, such as MicroXML, proposed by James Clark and others.

Clark’s thoughts about MicroXML, published on his blog in December 2010, lay out a solid set of requirements, ditching “problematic” parts of XML like the DOCTYPE declaration, namespaces, coding other than UTF-8, XML declarations, attribute value normalization, and CDATA sections.

What has happened since then? In mid-2011, John Cowan built on Clark’s requirements with a draft spec for MicroXML.

And then, what prompted today’s musing is a two-part set of articles by Uche Ogbuji, published on IBM DeveloperWorks in mid-June 2012: Explore the Basic Principles of MicroXML and Process MicroXML with MicroLark.

What do you think about XML and MicroXML — and would you welcome a subset?

Charlie Frank

Charlie has over a decade of experience in website administration and technology management. As the site admin, he oversees all technical aspects of running a high-traffic online platform, ensuring optimal performance, security, and user experience.

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.