
he .doc file format that is still present in Word 2003 is essentially a proprietary binary format; sadly, .doc files are difficult to extract information from. By saving documents in the new XML format, you can easily retrieve information trapped inside of Word 2003 documents by using little more than XPath queries.
New features included in Word 2003 also allow you to force users into entering data into an XML document without their knowledge! Essentially, you can annotate a document with an XML schema and then protect the document, only allowing the user to add or edit information in specific locations throughout the document. This way, when the user saves the document, the data is written directly to an XML document, allowing it to be easily consumed by another application or a database.
Another cool idea for using XML with Word 2003 documents is the ability to transform XML into other formats. As of this writing, there is an XSLT provided by Microsoft that takes a Word 2003 XML document and transforms it into an HTML document for viewing in a Web browser. Of course, my first reaction to this was "What good is that? I can save a document as HTML, right?" Then I realized that I have complete control over this transformation by designing my own XSLT, unlike the "
Save as HTML" functionality from previous versions.
But these ideas are outside the topic of this article, which is focused on the ability to manipulate a Word 2003 document (saved as XML) from within code. Before Word 2003, all you could effectively do was to either use automation or to be really handy with the RTF format (and open the RTF using Word). With the ability of Word 2003 to both save as and read from XML, you can create sophisticated Word 2003 documents by processing and manipulating XML.
If you're not sure why you might try something like this, here are a few ideas:
- You can create documents from data within an application, such as form letters.
- You can send Word 2003 documents to a client workstation over the Internet as XML and have it correctly interpreted at the client workstation as a Word 2003 document.
- You can return Word 2003 documents from Web services.
So, to get a better feel for how this may benefit your own applications, let's walk through the creation of a Word 2003 template, save it as XML, and then manipulate the document (using data provided by a user) to produce a final document for use in the application.
Creating a Schema
With the ability to save as and read from XML, you can create sophisticated documents by processing and manipulating XML.
|
|
The first step in this process is to create a schema for the data that you can insert into the Word 2003 document template. Although you don't actually need to have a schema, it's a bit easier to work with the document if you apply a schema to it. Without the schema, you'd have to use a feature like bookmarks, which are rendered like the following XML snippet:
<aml:annotation aml:id="0"
w:type="Word.Bookmark.Start"
w:name="ContactName"/>
<w:p>
<w:r>
<w:t>[ContactName]</w:t>
</w:r>
<aml:annotation aml:id="0"
w:type="Word.Bookmark.End"/>
</w:p>
Notice how the bookmark, named
ContactName in this example, is delimited by two empty annotation elements. The only things that distinguish these elements are the type attribute values of
Word.Bookmark.Start and
Word.Bookmark.End. This is slightly more complex than applying a schema to the document, which produces the XML in the following snippet:
<ns0:ContactName>
<w:p>
<w:r>
<w:t>[ContactName]</w:t>
</w:r>
</w:p>
</ns0:ContactName>
Because I'm starting from scratch, the schema approach seems to be a slightly easier way to go. But I can imagine a situation where you are migrating your approach from an earlier version of Word and where your documents are marked up with bookmarks. As you can see, it's still possible to use the bookmarks, just a tiny bit more work than using an attached schema.
For example, using the
Northwind Customers table from SQL Server, I've created a very simple schema that is listed in its entirety in
Listing 1.
This simple schema points out another advantage to using a schema-based approach: Word 2003 enforces the restrictions defined in the schema for the document. Any violations appear as errors in Word 2003's task pane feature, but you can also validate the document against the schema with any XML validation tool.
The schema that you create can be as simple or as complex as you like. What is important is how to mark up the Word 2003 document with this schema so that you get the desired XML output from your application.