advertisement
Login | Register   
  Include Code  Search Tips
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   TIP BANK
Browse DevX
Download the code for this article
Do you automate Office applications from Visual Basic? MS Word? Are the techniques shown in this article useful? Had you already seen an example of Word 2003's XML capabilities. What do you think of them? Given that you could achieve the same result by applying an XSLT transform to the XML that Word 2003 generates, do you think Word automation that simply manipulates output is still useful? Let us know in the xml.general discussion group.
Partners & Affiliates
advertisement
advertisement
advertisement
advertisement
 

Export Customized XML from Microsoft Word with VB.NET

Learn to use Word automation from .NET to turn hard-to-process Word documents into customizable XML 


advertisement
ord automation has traditionally been the province of VB Classic developers, but it's alive and well in VB.NET—it's just a little different. Word automation is the process of using the classes and methods exposed by Word to create new Word documents or alter or manipulate existing Word documents. In this article, you'll see how to get started with Word automation in VB.NET by exploring a process for transforming Word documents into customizable XML. The technique shown here doesn't rely on Word 2003's XML capabilities, so you can use it with any version of Word that supports automation. Most of the techniques you'll see apply generally to any application that needs to automate Word from within .NET


Here's the process in a nutshell: You add a reference to the Word automation library to your project and use that reference to create a Word application object that can open Word files and export the document's contents to an XML document. For this project, the application exports the content in such a way that each Word style gets translated to an appropriate XML element. By default, the application uses the names of Word styles (sometimes in slightly modified form) applied to paragraphs as the element names for the document. As written, the application follows the sequence of page breaks in the Word document itself. It doesn't take section breaks within a document into account, but that would be relatively easy to add. The application preserves style formatting, but ignores empty paragraphs (those containing only whitespace such as spaces, tabs, carriage returns, and linefeeds).

It's quick, easy and you get access to all the articles on DevX.
This registration/login is to allow you to read articles on devx.com.
Already a member?



advertisement