Export Customized XML from Microsoft Word with VB.NET
Learn to use Word automation from .NET to turn hard-to-process Word documents into customizable XML
by A. Russell Jones, Executive Editor
Sep 18, 2003
Page 1 of 6
ord automation has traditionally been the province of VB Classic developers, but it's alive and well in VB.NETit's just a little different. Word automation is the process of using the classes and methods exposed by Word to create new Word documents or alter or manipulate existing Word documents. In this article, you'll see how to get started with Word automation in VB.NET by exploring a process for transforming Word documents into customizable XML. The technique shown here doesn't rely on Word 2003's XML capabilities, so you can use it with any version of Word that supports automation. Most of the techniques you'll see apply generally to any application that needs to automate Word from within .NET
Here's the process in a nutshell: You add a reference to the Word automation library to your project and use that reference to create a Word application object that can open Word files and export the document's contents to an XML document. For this project, the application exports the content in such a way that each Word style gets translated to an appropriate XML element. By default, the application uses the names of Word styles (sometimes in slightly modified form) applied to paragraphs as the element names for the document. As written, the application follows the sequence of page breaks in the Word document itself. It doesn't take section breaks within a document into account, but that would be relatively easy to add. The application preserves style formatting, but ignores empty paragraphs (those containing only whitespace such as spaces, tabs, carriage returns, and linefeeds).
Here's an example. Suppose you have a Word document that looks like the sample.docsample.doc file shown in Figure 1.