advertisement
Premier Club Log In/Registration
  Include Code  Search Tips
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   SKILLBUILDING  |   TIP BANK  |   SOURCEBANK  |   FORUMS  |   NEWSLETTERS
Browse DevX
Download the code for this article
Do you automate Office applications from Visual Basic? MS Word? Are the techniques shown in this article useful? Had you already seen an example of Word 2003's XML capabilities. What do you think of them? Given that you could achieve the same result by applying an XSLT transform to the XML that Word 2003 generates, do you think Word automation that simply manipulates output is still useful? Let us know in the xml.general discussion group.
Partners & Affiliates
advertisement
advertisement
advertisement
Blue Star Average Rating: 4.5/5 | Rate this item | 31 users have rated this item.
 

Export Customized XML from Microsoft Word with VB.NET

Learn to use Word automation from .NET to turn hard-to-process Word documents into customizable XML 


advertisement
ord automation has traditionally been the province of VB Classic developers, but it's alive and well in VB.NET—it's just a little different. Word automation is the process of using the classes and methods exposed by Word to create new Word documents or alter or manipulate existing Word documents. In this article, you'll see how to get started with Word automation in VB.NET by exploring a process for transforming Word documents into customizable XML. The technique shown here doesn't rely on Word 2003's XML capabilities, so you can use it with any version of Word that supports automation. Most of the techniques you'll see apply generally to any application that needs to automate Word from within .NET


Here's the process in a nutshell: You add a reference to the Word automation library to your project and use that reference to create a Word application object that can open Word files and export the document's contents to an XML document. For this project, the application exports the content in such a way that each Word style gets translated to an appropriate XML element. By default, the application uses the names of Word styles (sometimes in slightly modified form) applied to paragraphs as the element names for the document. As written, the application follows the sequence of page breaks in the Word document itself. It doesn't take section breaks within a document into account, but that would be relatively easy to add. The application preserves style formatting, but ignores empty paragraphs (those containing only whitespace such as spaces, tabs, carriage returns, and linefeeds).

Here's an example. Suppose you have a Word document that looks like the sample.docsample.doc file shown in Figure 1.

 
Figure 1: A Simple Word Document. The simple document shown contains several different paragraph and character styles as well as a hard page break.
If you save this simple document using Word 2003's "Save As XML" capabilities, the resulting file looks like Listing 1. Microsoft's XML is complicated—far too complicated for many purposes—because it needs to handle every possible variation. In contrast, the default output from the sample application you'll build looks like this:

<document>
   <page id="1">
      <h1>This is a title </h1>
      <p>This is a normal paragraph with some 
<b>bold</b> and <i>italic</i> text in the middle.</p>
      <p>This is a</p>
      <p>bulleted list</p>
      <inset>This is a paragraph with a custom style 
named inset.</inset>
   </page>
   <page id="2">
      <p>This is another normal paragraph on page 2--
the close of the document.</p>
   </page>
</document>

To get started, you need to create a reference to the Microsoft Word library, which gives you access to the Word automation classes.

  Next Page: Reference the Appropriate Word Library


Page 1: IntroductionPage 4: Determining Page Numbers
Page 2: Reference the Appropriate Word LibraryPage 5: Mapping Style Names to Element Names
Page 3: Finding and Replacing Character StylesPage 6: Customizing the Doc-to-XML Sample Application
Please rate this item (5=best)
 1  2  3  4  5
advertisement
Advertising Info  |   Member Services  |   Permissions  |   Contact Us  |   Help  |   Feedback  |   Site Map  |   Network Map  |   About

internet.commediabistro.comJusttechjobs.comGraphics.com

Search:

WebMediaBrands Corporate Info

Legal Notices, Licensing, Reprints, Permissions, Privacy Policy.
Advertise | Newsletters | Shopping | E-mail Offers | Freelance Jobs