he .doc file format that is still present in Word 2003 is essentially a proprietary binary format; sadly, .doc files are difficult to extract information from. By saving documents in the new XML format, you can easily retrieve information trapped inside of Word 2003 documents by using little more than XPath queries.
New features included in Word 2003 also allow you to force users into entering data into an XML document without their knowledge! Essentially, you can annotate a document with an XML schema and then protect the document, only allowing the user to add or edit information in specific locations throughout the document. This way, when the user saves the document, the data is written directly to an XML document, allowing it to be easily consumed by another application or a database.
Another cool idea for using XML with Word 2003 documents is the ability to transform XML into other formats. As of this writing, there is an XSLT provided by Microsoft that takes a Word 2003 XML document and transforms it into an HTML document for viewing in a Web browser. Of course, my first reaction to this was "What good is that? I can save a document as HTML, right?" Then I realized that I have complete control over this transformation by designing my own XSLT, unlike the "Save as HTML
" functionality from previous versions.
But these ideas are outside the topic of this article, which is focused on the ability to manipulate a Word 2003 document (saved as XML) from within code. Before Word 2003, all you could effectively do was to either use automation or to be really handy with the RTF format (and open the RTF using Word). With the ability of Word 2003 to both save as and read from XML, you can create sophisticated Word 2003 documents by processing and manipulating XML.
If you're not sure why you might try something like this, here are a few ideas:
- You can create documents from data within an application, such as form letters.
- You can send Word 2003 documents to a client workstation over the Internet as XML and have it correctly interpreted at the client workstation as a Word 2003 document.
- You can return Word 2003 documents from Web services.
So, to get a better feel for how this may benefit your own applications, let's walk through the creation of a Word 2003 template, save it as XML, and then manipulate the document (using data provided by a user) to produce a final document for use in the application.
Creating a Schema
|With the ability to save as and read from XML, you can create sophisticated documents by processing and manipulating XML.|
The first step in this process is to create a schema for the data that you can insert into the Word 2003 document template. Although you don't actually need to have a schema, it's a bit easier to work with the document if you apply a schema to it. Without the schema, you'd have to use a feature like bookmarks, which are rendered like the following XML snippet:
Notice how the bookmark, named ContactName
in this example, is delimited by two empty annotation elements. The only things that distinguish these elements are the type attribute values of Word.Bookmark.Start
. This is slightly more complex than applying a schema to the document, which produces the XML in the following snippet: