ednesday evening, Microsoft announced that the default file format for the next version of its Office Suite would be XML. That’s default as in, by default, Office will save files in its Office Open XML format, not in the ubiquitous binary?and proprietary?form that’s been the hallmark of Office up till now. While you may find it hard to get excited over a change in, of all things, file format, this is important; it’s is the beginning of the end for proprietary, closed Office file formats.
The Office Open XML format is public and royalty-free, and the XML-formatted documents can be read and processed with any XML-processing language or tool. Even better, Microsoft has compressed Office output by zipping the XML, which Microsoft claims can make Office files up to 75 percent smaller (they had to do that or the new files wouldn’t be smaller, they’d be larger than the older binary format).
The Mac version of Office 12 also gains the new XML file format capability, which will simplify the process of sharing Office files between Mac and Windows versions.
Earlier Office versions (2000, XP, and 2003) offered increasingly powerful options for saving and reading XML-formatted files, but XML isn’t the default format in those versions. Word gained XML capability first, then Excel, and then InfoPath in Office 2003 added powerful new XML information-gathering capabilities. The new version continues and expands the XML trend. In this version, PowerPoint and Visio also gain the ability to read and write its files in XML.
In the press release, Steven Sinofsky, Microsoft’s Senior Vice President, Office, explains:
“XML enables companies to capture information so it can be repurposed and reused however and whenever the organization needs to use it, regardless of platform. Building on XML support in Microsoft Office, customers can improve data flow throughout their organizations. They can build customized business process and productivity solutions that help information workers make a greater impact on their business.”
The last time Microsoft changed the Office file format significantly was in the rollout of Office 97, which broke file compatibility with earlier versions. Customers, of course, complained vociferously. Determined to avoid a repeat scenario, this time Microsoft has not only provided Office 12 with the ability to save files in binary form compatible with earlier Office versions, but also promised a free update patch for Office 2000, XP, and 2003 that will let those older versions read and write the newer XML format. In addition, Microsoft will provide tools to bulk-update older Office file archives to the new XML format. Users can choose which (XML or binary) file format they want to be the default, and by default, the applications will save a file in the same format it was opened from, which will prevent problems in existing automated processing systems.
I applaud the change, not only because it shows the power of XML, but also because it makes delivering malevolent macros via Office documents far more difficult, brings Microsoft Office into line with OpenOffice, which already uses an XML-based file format, and opens up the content of millions upon millions of Office documents to retrieval and manipulation with standard programming tools, bringing Office output into the mainstream, and making Office tools integrate far more easily with other tools developers are already using to build application.
To differentiate the newest file formats from older versions, Microsoft has created new “x” extensions: .docx, .xslx, etc., which will be associated with the new Office applications automatically when you install Office.
Obviously, some types of content, such as bitmapped images, aren’t particularly amenable to being stored in XML, so the new file format stores these separately from those portions of the documents that are, creating a more complex store, but one that’s far more resistant to corruption (loss of any one file type doesn’t affect other portions of the same document), easier to fix (the new Office applications will attempt to fix corrupted files) and far more accessible to external applications (you don’t need to read an entire document to extract images, for example).
Most business users won’t care about this news, and indeed, many of them will probably never even know any change has occurred, but truly, this is good news for developers.