Generating Microsoft Office Documents with the Open XML SDK

Generating Microsoft Office Documents with the Open XML SDK

evelopers have worked with Office documents for years, primarily word-processing documents, presentations, and spreadsheets. While the various Microsoft Office applications have provided built-in support for creating and modifying Office documents, the creators and consumers of these documents have primarily been humans.

Now however, the need to improve business workflows has driven the need for software applications to be able to operate on Office documents. The main requirements arising from this need are:

  1. Document interoperability: The ability to work on documents using tools other than Office.
  2. Document manipulation and generation: The ability to create and modify documents programmatically.

Microsoft Office documents based on the older binary formats supported the preceding requirements to some extent using COM object model; however, it was complex and?because it required automating the single-threaded Office client applications?not scalable. Microsoft’s Open Office XML (OOXML) format, the default format for Office 2007 documents, overcomes both issues.

What is OOXML?

OOXML is an XML file format specification for representing word-processing documents, presentations and spreadsheets. Microsoft created the original specification, which was later approved as an ECMA 376 and ISO/IEC 29500 standard. OOXML uses familiar technologies such as XML and ZIP. Document content resides in a file package that conforms to the Open Package Convention. An OOXML file package contains a few XML files as well as other required resources such as image files, video files, and so forth.

The key concepts of OOXML are:

  • An OOXML document is a ZIP-based package of files.
  • A Package is composed of various parts, including a Main Document part, Image parts, Video, Slide, Workbook, Document Properties parts, etc. Each part is represented by a file in the zipped package.
  • Relationships determine how the collection of parts comes together to form a document.
  • Content types define the types of parts that can be stored in a package.

Benefits of OOXML

Because OOXML documents are based on standard, open, platform-independent formats, the ability to interoperate with Office documents has increased significantly compared to the earlier binary formats. Here are some of the benefits the OOXML format:

  • Document Assembly: It’s easy to create documents, because individual parts can be created separately, and assembled when required.
  • Document Archiving: You can save space by storing a single instance of the common parts from a large number of similar documents as well as the unique content for each document, assembling the individual documents in their entirety again when required.
  • Searching Documents: Because content is stored in XML format, it’s much easier to search using common tools.
  • Business Process Efficiency: Using an XML-based document format helps when automating decisions based on document content.

The OOXML SDK

The Open XML SDK is a .NET class library that exposes standard XML and Packaging APIs for working with OOXML documents. The SDK provides typed access to both the Open Package Convention packages and the XML content in those packages. Currently, two versions of the SDK are available:

  • Version 1: Provides strongly typed access to packages.
  • Version 2: Provides strongly typed access to both packages and their contents (currently available as an April 2009 CTP release).

Developing Applications with the Open XML SDK

This article uses the Open XML SDK version 2.0 CTP release. Download and install the SDK using the instructions on the download link page.

Set Up the Environment

After installing the SDK on your development machine, create a new project and add a reference to the SDK DocumentFormat.OpenXml assembly by clicking References Add Reference. Click the .NET tab, and then select the DocumentFormat.OpenXml assembly. If you don’t see the assembly in the .NET tab, you can add it manually from your installation folder.

The Open XML SDK uses .NET packaging APIs internally, so you also need to add a reference to System.IO.Packaging.Package.

Finally, add a reference to the WindowsBase assembly.

In your code, you’ll want to add these three main namespaces:

  • DocumentFormat.OpenXml: General Open XML related functions.
  • DocumentFormat.OpenXml.Packaging: APIs related to packaging.
  • DocumentFormat.OpenXml.Wordprocessing: APIs for working on Microsoft Word (.docx) documents. The assembly also exposes namespaces for other Office client applications such as Excel and PowerPoint.

Creating a Basic Word Document

To get started, you’ll create a basic Word document which will serve to show you the basic infrastructure for Word documents. Remember that an Open XML package is composed of parts (Main Document Part, Image Part, Video Part, Document Properties Part, etc.). Each part is represented by at least one file.

The WordprocessingDocument class represents a Word package. You can use the class to create a new package or open an existing package:

WordprocessingDocument doc = WordprocessingDocument.Create    (@"BasicWordDoc.docx", WordprocessingDocumentType.Document);

The Main Document part contains the text of the document?and it’s the only required part. You can add a main document part to a package using the following code:

MainDocumentPart mainPart = doc.AddMainDocumentPart();mainPart.Document = new Document();

The various parts of the document itself are arranged hierarchically as follows:

MainDocumentPart  Document    Body      Paragraph        Run          Text

The SDK exposes classes for each of these components, which makes it simple to generate a Word document programmatically. For example, to create a document containing the text “Hello World,” you walk up the hierarchy, creating the appropriate objects at each step, passing each object to the constructor of its parent object:

Text textFirstLine = new Text("Hello World");Run run = new Run(textFirstLine);Paragraph para = new Paragraph(run);Body body = new Body(para);

After creating the object hierarchy, you can append the Body object to the document created earlier:

mainPart.Document.Append(body);

At this point, you can save the main document part and close the package:

mainPart.Document.Save();doc.Close();

Here’s the complete code for a C# console application that generates the “Hello World” example:

static void Main(string[] args){   /* Create the package and main document part */   WordprocessingDocument doc =       WordprocessingDocument.Create(@"BasicWordDoc.docx",       WordprocessingDocumentType.Document);      MainDocumentPart mainPart = doc.AddMainDocumentPart();      mainPart.Document = new Document();   /* Create the contents */   Text textFirstLine = new Text("Hello World");   Run run = new Run(textFirstLine);   Paragraph para = new Paragraph(run);   Body body = new Body(para);   mainPart.Document.Append(body);   /* Save the results and close */   mainPart.Document.Save();   doc.Close();}

If you execute the application, it will create a new document. If you open that with Word, you’ll see the document shown in Figure 1.

?
Figure 1. Basic Document: This basic Word document was created programmatically.

As you can see, the process to create a basic document is straightforward; however, most documents are more complicated, containing styled text and other formatting.

Creating Styled Documents

This next example generates a slightly more complex Word document with two paragraphs, each containing text in a different font. Here’s the process:

  1. Create the Word processing document package and add the main document part to it as in the previous example.
  2. Create the first paragraph containing the text “Hello World,” but don’t add it to the body yet:
  3. Text textFirstLine1 = new Text("Hello World");Run run1 = new Run(textFirstLine1);Paragraph para1 = new Paragraph(run1);
  4. Create a second paragraph containing the text “Hello Open XML Community:”
  5. Text textFirstLine2 = new Text("Hello Open XML Community");Run run2 = new Run(textFirstLine2);Paragraph para2 = new Paragraph(run2);
  6. You apply font formatting to a Run. Runs can contain a RunProperties object, which control the formatting applied to the text contained in that Run. Runs that don’t have RunProperties use Word’s default font. You can create RunProperties objects independently and then apply them to any Run. The following code creates a RunProperties object that causes the Run to display its text using the Arial Black font:
  7. RunProperties runProp = new RunProperties();RunFonts runFont = new RunFonts();runFont.Ascii = "Arial Black";runProp.Append(runFont);
  8. Apply the RunProperties object you just created to the Run run2 you created earlier:
  9. run2.PrependChild(runProp);
  10. Create a new Body instance and append both paragraphs:
  11. Body body = new Body();body.Append(para1);body.Append(para2);
  12. Add the body to the document. Save and close the package:
  13. mainPart.Document.Append(body);mainPart.Document.Save();doc.Close();

When you execute this program, it will create a document containing two paragraphs with different fonts (see Figure 2).

?
Figure 2. Styled Text: This two-paragraph document contains formatted text.

Search and Replace Text in a Word Document

Creating new documents addresses only one aspect of working with OOXML documents. This next example opens an existing document, searches for some text in that document, and replaces it with other text. This is typical of scenarios where you want to generate a large number of documents based on a small template: You’d read the template, replace some portion of the template with custom content, and then save the altered document. (You’ll see a large-template scenario in the next section).

First, open the document using WordprocessingDocument class and get the MainDocumentPart. The document in this example is named SearchAndReplace.docx:

WordprocessingDocument doc =    WordprocessingDocument.Open(   @"SearchAndReplace.docx", true);MainDocumentPart mainPart =    doc.MainDocumentPart;

Read the entire document contents using the GetStream method:

using (StreamReader sr = new StreamReader(   doc.MainDocumentPart.GetStream())){   docText = sr.ReadToEnd();}

At the end of this process, the docText variable contains all the XML for the document text. Next, replace contents in the docText variable as needed. For this example the template contains the text "The current version of [sdk] is [VersionNumber]."

The task is to replace the [sdk] and [VersionNumber] placeholders with actual values. You can use standard .NET string-manipulation code to make the replacement, so I won’t show it here. After replacing the text, write the complete text back to the Main Document part using the following code:

using (StreamWriter sw = new    StreamWriter(doc.MainDocumentPart.GetStream(   FileMode.Create))){    sw.Write(docText);}

Template-Driven Document Generation using Word Content Controls

The example in the last section read the entire contents of a short document template into memory, and then performed a search and replace operation. That’s fine for small templates, but when you have large multi-page templates, that approach will create memory and performance issues. Instead, you can use Content Controls, which help create templates, support structured editing, and also provide placeholders for various kinds of content in documents.

The primary content controls available are:

  • Plain Text
  • Rich Text
  • Picture
  • Calendar
  • Combo Box
  • Drop-Down List

Apart from the intrinsic benefits that structured documents and content-type restrictions offer, you also benefit from the way OOXML stores data rendered in content controls.

OOXML stores content control data in a custom XML file in the document package. Individual controls are mapped to elements in the custom XML file. When you open such a document, it late-binds to the content control data in the file. While the document is open, any changes you make to content in the controls gets reflected in the XML data?and vice-versa.

The fact that content control data is stored separately and mapped to controls at runtime makes it a good candidate for generating template-based documents.

This example covers three main topics:

  • Creating a template based on content controls
  • Using the Word 2007 Content Control Toolkit to map controls to custom XML elements
  • Updating the custom XML data programmatically, and generating documents based on the template

The next sections explain each topic in more detail.

Creating a Template

Open Word, create a new document, and switch to the Developer tab on the ribbon.

Author’s Note: If the Developer tab is not visible (it’s not by default), you can enable it by opening Word Options. To do that, click the Office button at the top left of your Word window and click the Word Options button at the bottom. In the “Popular” group, click the “Show Developer tab in the ribbon” option. Close the Word Options dialog, and the Developer tab will appear.

The Developer ribbon has a button group called “Controls” that let you insert various kinds of controls into the document. This example uses the same template as the “Search and Replace”section earlier in this article. This time, however, you’ll create it using content controls. Again, the template example contains the text "The current version of [sdk] is [VersionNumber]."

Add two plain text controls for the SDK name and version number. After adding the controls, the template will look similar to Figure 3, depending on the control names you provided.

?
Figure 3. Template with Content Controls: In Word, the sample document containing the content controls should look similar to this.

Word 2007 Content Control Toolkit

The Word 2007 Content Control Toolkit provides a visual interface that helps when mapping custom XML elements to content controls?a process much easier than writing XPath queries. Download the Word 2007 Content Control Toolkit from CodePlex and install it, then start the tool and open the document template you created in the preceding section.

In the “Content Controls” pane on the left, you will see the details of the two controls in the template, including their names and types. In the “Custom XML parts” pane on the right, click on the link “Click here to create a new one” to create a new custom XML file. Switch to Edit View and add two elements that will store data for the two controls. The element names do not have to match the control names. For example, my XML file looks like this:

Switch to “Bind View” and drag the elements you created to the left pane and drop them on the controls. The drag/drop process establishes the bindings. After you’ve established the bindings, the left pane will look similar to Figure 4.

?
Figure 4. Bound Content Controls: Here’s how the Content Controls pane in the Word 2007 Content Control Toolkit looks after binding the SDKName and VersionNumber controls to specific XML elements.

Save the template, and then inspect the document package by changing the extension from docx to zip. You will find a new customXml folder containing your custom XML file with the data. If you change the contents of this XML file and then reopen the document (remembering to change the zip extension back to docx), you will find that the content controls now display the updated content. Similarly, if you change the control content in Word, save the file, rename it, and re-inspect the custom XML file, you’ll see that the changes have been persisted there.

With the template and bindings in place, you now have the opportunity to generate a large number of documents based on the template.

Update Custom XML Data Programmatically

Open the package using the WordprocessingDocument class’s Open method:

WordprocessingDocument wordDoc =    WordprocessingDocument.Open(fileName, true);

Next, create the custom XML file containing the data for this document, and store it in the package.

Store the custom XML in memory and add placeholders for the actual data. For this sample the custom XML with placeholders is a string containing:

 !Name!    !Version! 

You’ll replace the !Name! and !Version! placeholders with actual data for each document. This example uses the Regex utility, but you can use any code or technology you like to create your custom XML. When your custom XML is ready, replace the existing custom XML with the new one by deleting the existing one and adding the new one using the following code:

MainDocumentPart mainPart = wordDoc.MainDocumentPart;mainPart.DeleteParts(mainPart.CustomXmlParts);CustomXmlPart customXmlPart = mainPart.AddNewPart();StreamWriter ts = new StreamWriter(customXmlPart.GetStream());ts.Write(customXML);

You can now save this document under a different name, and repeat the process as needed, using different data for each document, giving you a fast way to create large numbers of custom documents.

The main difference between this and the earlier Search and Replace approach is that this technique focuses only on the dynamic data, while the other approach required parsing the entire document.

This approach is also much more efficient than Mail Merge functionality available which is used for creating large number of small documents based on template and data store.

This article used only a small fraction of the OpenXML SDK, but if you’ve ever tried to create or manipulate Word files programmatically using earlier technologies, you can probably already tell that this is a much more robust and simpler way. In addition to the scenarios shown here, the OpenXML SDK also lets you operate on comments and tracked changes stored in Word documents. In addition, the SDK contains APIs that operate on Microsoft Excel and PowerPoint documents.

devx-admin

devx-admin

Share the Post:
Razer Discount

Unbelievable Razer Blade 17 Discount

On September 24, 2023, it was reported that Razer, a popular brand in the premium gaming laptop industry, is offering an exceptional deal on their

Innovation Ignition

New Fintech Innovation Ignites Change

The fintech sector continues to attract substantial interest, as demonstrated by a dedicated fintech stage at a recent event featuring panel discussions and informal conversations

Import Easing

Easing Import Rules for Big Tech

India has chosen to ease its proposed restrictions on imports of laptops, tablets, and other IT hardware, allowing manufacturers like Apple Inc., HP Inc., and

Anthropic Investment

Amazon’s Bold Anthropic Investment

On Monday, Amazon announced its plan to invest up to $4 billion in the AI firm Anthropic, acquiring a minority stake in the process. This

Razer Discount

Unbelievable Razer Blade 17 Discount

On September 24, 2023, it was reported that Razer, a popular brand in the premium gaming laptop industry, is offering an exceptional deal on their Razer Blade 17 model. Typically

Innovation Ignition

New Fintech Innovation Ignites Change

The fintech sector continues to attract substantial interest, as demonstrated by a dedicated fintech stage at a recent event featuring panel discussions and informal conversations with industry professionals. The gathering,

Import Easing

Easing Import Rules for Big Tech

India has chosen to ease its proposed restrictions on imports of laptops, tablets, and other IT hardware, allowing manufacturers like Apple Inc., HP Inc., and Dell Technologies Inc. more time

Semiconductor Stock Plummet

Dramatic Downturn in Semiconductor Stocks Looms

Recent events show that the S&P Semiconductors Select Industry Index seems to be experiencing a downturn, which could result in a decline in semiconductor stocks. Known as a key indicator

Anthropic Investment

Amazon’s Bold Anthropic Investment

On Monday, Amazon announced its plan to invest up to $4 billion in the AI firm Anthropic, acquiring a minority stake in the process. This decision demonstrates Amazon’s commitment to

AI Experts Get Hired

Tech Industry Rehiring Wave: AI Experts Wanted

A few months ago, Big Tech companies were downsizing their workforce, but currently, many are considering rehiring some of these employees, especially in popular fields such as artificial intelligence. The

Lagos Migration

Middle-Class Migration: Undermining Democracy?

As the middle class in Lagos, Nigeria, increasingly migrates to private communities, a PhD scholar from a leading technology institute has been investigating the impact of this development on democratic

AI Software Development

ChatGPT is Now Making Video Games

Pietro Schirano’s foray into using ChatGPT, an AI tool for programming, has opened up new vistas in game and software development. As design lead at business finance firm Brex, Schirano

Llama Codebot

Developers! Here’s Your Chatbot

Meta Platforms has recently unveiled Code Llama, a free chatbot designed to aid developers in crafting coding scripts. This large language model (LLM), developed using Meta’s Llama 2 model, serves

Tech Layoffs

Unraveling the Tech Sector’s Historic Job Losses

Throughout 2023, the tech sector has experienced a record-breaking number of job losses, impacting tens of thousands of workers across various companies, including well-established corporations and emerging startups in areas

Chinese 5G Limitation

Germany Considers Limiting Chinese 5G Tech

A recent report has put forth the possibility that Germany’s Federal Ministry of the Interior and Community may consider limiting the use of Chinese 5G technology by local network providers

Modern Warfare

The Barak Tank is Transforming Modern Warfare

The Barak tank is a groundbreaking addition to the Israeli Defense Forces’ arsenal, significantly enhancing their combat capabilities. This AI-powered military vehicle is expected to transform the way modern warfare

AI Cheating Growth

AI Plagiarism Challenges Shake Academic Integrity

As generative AI technologies like ChatGPT become increasingly prevalent among students and raise concerns about widespread cheating, prominent universities have halted their use of AI detection software, such as Turnitin’s

US Commitment

US Approves Sustainable Battery Research

The US Department of Energy has revealed a $325 million commitment in the research of innovative battery types, designed to enable solar and wind power as continuous, 24-hour energy sources.

Netanyahu Musk AI

Netanyahu and Musk Discuss AI Future

On September 22, 2023, Israeli Prime Minister Benjamin Netanyahu met with entrepreneur Elon Musk in San Francisco prior to attending the United Nations. In a live-streamed discussion, Netanyahu lauded Musk

Urban Gardening

Creating Thriving Cities Through Urban Gardening

The rising popularity of urban gardening is receiving increased recognition for its numerous advantages, as demonstrated in a recent study featured in the Environmental Research Letters journal. Carried out by

What You Need to Know About Cloud Security Strategies

What You Need to Know About Cloud Security Strategies

Today, many businesses are adopting cloud computing services. As a result, it’s important to recognize that security measures for data in the cloud are different from those in traditional on-premises

Romanian Energy Security

Eastern Europe is Achieving Energy Security

Canada and Romania have solidified their commitment to energy security and independence from Russian energy exports by signing a $3-billion export development agreement. The deal is centered on constructing two

Seamless Integration

Unlocking Seamless Smart Home Integration

The vision of an intelligently organized and interconnected smart home that conserves time, energy, and resources has long been desired by many homeowners. However, this aspiration has often been hindered

New Algorithm

MicroAlgo’s Groundbreaking Algorithm

MicroAlgo Inc. has revealed the creation of a knowledge-augmented backtracking search algorithm, developed through extensive research in evolutionary computational techniques. The algorithm is designed to boost problem-solving effectiveness, precision, and

Poland Energy Future

Westinghouse Builds Polish Power Plant

Westinghouse Electric Company and Bechtel have come together to establish a formal partnership in order to design and construct Poland’s inaugural nuclear power plant at the Lubiatowo-Kopalino site in Pomerania.

EV Labor Market

EV Industry Hurting For Skilled Labor

The United Auto Workers strike has highlighted the anticipated change towards a future dominated by electric vehicles (EVs), a shift which numerous people think will result in job losses. However,