evelopers have worked with Office documents for years, primarily word-processing documents, presentations, and spreadsheets. While the various Microsoft Office applications have provided built-in support for creating and modifying Office documents, the creators and consumers of these documents have primarily been humans.
Now however, the need to improve business workflows has driven the need for software applications to be able to operate on Office documents. The main requirements arising from this need are:
- Document interoperability: The ability to work on documents using tools other than Office.
- Document manipulation and generation: The ability to create and modify documents programmatically.
Microsoft Office documents based on the older binary formats supported the preceding requirements to some extent using COM object model; however, it was complex and—because it required automating the single-threaded Office client applications—not scalable. Microsoft's Open Office XML (OOXML) format, the default format for Office 2007 documents, overcomes both issues.
What is OOXML?
OOXML is an XML file format specification for representing word-processing documents, presentations and spreadsheets. Microsoft created the original specification, which was later approved as an ECMA 376 and ISO/IEC 29500 standard. OOXML uses familiar technologies such as XML and ZIP. Document content resides in a file package that conforms to the Open Package Convention. An OOXML file package contains a few XML files as well as other required resources such as image files, video files, and so forth.
The key concepts of OOXML are:
- An OOXML document is a ZIP-based package of files.
- A Package is composed of various parts, including a Main Document part, Image parts, Video, Slide, Workbook, Document Properties parts, etc. Each part is represented by a file in the zipped package.
- Relationships determine how the collection of parts comes together to form a document.
- Content types define the types of parts that can be stored in a package.
Benefits of OOXML
Because OOXML documents are based on standard, open, platform-independent formats, the ability to interoperate with Office documents has increased significantly compared to the earlier binary formats. Here are some of the benefits the OOXML format:
- Document Assembly: It's easy to create documents, because individual parts can be created separately, and assembled when required.
- Document Archiving: You can save space by storing a single instance of the common parts from a large number of similar documents as well as the unique content for each document, assembling the individual documents in their entirety again when required.
- Searching Documents: Because content is stored in XML format, it's much easier to search using common tools.
- Business Process Efficiency: Using an XML-based document format helps when automating decisions based on document content.
The OOXML SDK
The Open XML SDK is a .NET class library that exposes standard XML and Packaging APIs for working with OOXML documents. The SDK provides typed access to both the Open Package Convention packages and the XML content in those packages. Currently, two versions of the SDK are available:
- Version 1: Provides strongly typed access to packages.
- Version 2: Provides strongly typed access to both packages and their contents (currently available as an April 2009 CTP release).