Browse DevX
Sign up for e-mail newsletters from DevX


Export Customized XML from Microsoft Word with VB.NET : Page 2

Learn to use Word automation from .NET to turn hard-to-process Word documents into customizable XML




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Reference the Appropriate Word Library
Start a new Windows Application project in Visual Studio.NET, and name it WordAutomation. After creating the project, right-click on the References item in the Solution Explorer, and select Add Reference. Click the COM tab on the Add Reference dialog and then select the Microsoft Word MajorVersion.minorversion item (where MajorVersion.minorversion stands for the major and minor version number of the Word release you're targeting). Doing that creates several references called Word, VBIDE, stdole, and Microsoft.Office.Core. The only one you need to worry about for this project is the Word reference.

Getting Word Content
The first step is to use Word automation to open and close Word documents and iterate through them to extract content. As an example, you'll see how to load the sample.doc file that accompanies the downloadable code for this article, and display the text of each paragraph formatted as XML in a TextBox on your default form. The sample Form1 form has a TextBox to hold the filename of the Word file to process, a Browse button that lets you select a file, a multi-line TextBox to display the results (see Figure 2), and a Process button that processes the selected Word file, turning it into a valid XML document.

Figure 1: The Sample Form for the WordAutomation Project. The form lets you select a Word file, processes the file into XML, and displays the results in the multi-line TextBox.

To open the Word file, first create a Word.Application object. The sample project creates this when it instantiates the class by defining a class-level variable.

Private wordApp As New Word.Application

When the user selects a file and clicks the Process button, the Click event-handler code opens the selected file by calling the Word application's Documents.Open method.

doc = wordApp.Documents.Open( _ CType(Me.txtFilename.Text, Object), 1, 1, 0)

The method creates a new instance of the Word.Document class, which represents a Word document.

Note that you can't create a Word.DocumentClass instance directly. Instead, you get the reference indirectly through the Word.Application object's Documents collection; in this case by calling the Open method to open the selected file. The Open method returns a WordDocument object, which you can then use to manipulate the document's content.

Author's Note: In earlier Word versions, using the Application and Document classes directly caused problems, so you may need to use the ApplicationClass and DocumentClass classes instead. For example, you may experience an irritating conflict between Close() methods. For a Word.Document instance, you'll get an error stating that "'Close' is ambiguous across the inherited interfaces 'Word._Document' and 'Word.DocumentEvents_Event'." One solution is to create your document references as DocumentClass instances rather than Document instances. Another workaround is to cast the Document or Application instances to a more specific DocumentClass or ApplicationClass instance before issuing the ambiguous call. For example, assuming that myDoc is a Document object, to issue a Close call you could write:

CType(myDoc, DocumentClass).Close()

The line of code above casts the Document reference myDoc to a DocumentClass reference, which avoids any ambiguity with the call to Close().

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date