advertisement
Premier Club Log In/Registration
  Include Code  Search Tips
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   SKILLBUILDING  |   TIP BANK  |   SOURCEBANK  |   FORUMS  |   NEWSLETTERS
Browse DevX
The Content Development Kit
Martin Sawicki
Andrew Bishop
Partners & Affiliates
advertisement
advertisement
CoDe Magazine
Subscribe to CoDe Magazine
Average Rating: 4.3/5 | Rate this item | 14 users have rated this item.
 Print Print
 

Working with Microsoft Office Word 2003's XML

One of Microsoft Office 2003's most significant new features is the integration of XML technology. This article focuses on taking advantage of Word 2003's XML features from within your applications. 


advertisement
he .doc file format that is still present in Word 2003 is essentially a proprietary binary format; sadly, .doc files are difficult to extract information from. By saving documents in the new XML format, you can easily retrieve information trapped inside of Word 2003 documents by using little more than XPath queries.

New features included in Word 2003 also allow you to force users into entering data into an XML document without their knowledge! Essentially, you can annotate a document with an XML schema and then protect the document, only allowing the user to add or edit information in specific locations throughout the document. This way, when the user saves the document, the data is written directly to an XML document, allowing it to be easily consumed by another application or a database.

Another cool idea for using XML with Word 2003 documents is the ability to transform XML into other formats. As of this writing, there is an XSLT provided by Microsoft that takes a Word 2003 XML document and transforms it into an HTML document for viewing in a Web browser. Of course, my first reaction to this was "What good is that? I can save a document as HTML, right?" Then I realized that I have complete control over this transformation by designing my own XSLT, unlike the "Save as HTML" functionality from previous versions.

But these ideas are outside the topic of this article, which is focused on the ability to manipulate a Word 2003 document (saved as XML) from within code. Before Word 2003, all you could effectively do was to either use automation or to be really handy with the RTF format (and open the RTF using Word). With the ability of Word 2003 to both save as and read from XML, you can create sophisticated Word 2003 documents by processing and manipulating XML.

If you're not sure why you might try something like this, here are a few ideas:

  • You can create documents from data within an application, such as form letters.
  • You can send Word 2003 documents to a client workstation over the Internet as XML and have it correctly interpreted at the client workstation as a Word 2003 document.
  • You can return Word 2003 documents from Web services.
So, to get a better feel for how this may benefit your own applications, let's walk through the creation of a Word 2003 template, save it as XML, and then manipulate the document (using data provided by a user) to produce a final document for use in the application.

Creating a Schema
With the ability to save as and read from XML, you can create sophisticated documents by processing and manipulating XML.
The first step in this process is to create a schema for the data that you can insert into the Word 2003 document template. Although you don't actually need to have a schema, it's a bit easier to work with the document if you apply a schema to it. Without the schema, you'd have to use a feature like bookmarks, which are rendered like the following XML snippet:
   <aml:annotation aml:id="0" 
       w:type="Word.Bookmark.Start" 
       w:name="ContactName"/>
   <w:p>
     <w:r>
       <w:t>[ContactName]</w:t>
     </w:r>
   <aml:annotation aml:id="0" 
       w:type="Word.Bookmark.End"/>
   </w:p>
Notice how the bookmark, named ContactName in this example, is delimited by two empty annotation elements. The only things that distinguish these elements are the type attribute values of Word.Bookmark.Start and Word.Bookmark.End. This is slightly more complex than applying a schema to the document, which produces the XML in the following snippet:
   <ns0:ContactName>
     <w:p>
       <w:r>
         <w:t>[ContactName]</w:t>
       </w:r>
     </w:p>
   </ns0:ContactName>
Because I'm starting from scratch, the schema approach seems to be a slightly easier way to go. But I can imagine a situation where you are migrating your approach from an earlier version of Word and where your documents are marked up with bookmarks. As you can see, it's still possible to use the bookmarks, just a tiny bit more work than using an attached schema.

For example, using the Northwind Customers table from SQL Server, I've created a very simple schema that is listed in its entirety in Listing 1.

This simple schema points out another advantage to using a schema-based approach: Word 2003 enforces the restrictions defined in the schema for the document. Any violations appear as errors in Word 2003's task pane feature, but you can also validate the document against the schema with any XML validation tool.

The schema that you create can be as simple or as complex as you like. What is important is how to mark up the Word 2003 document with this schema so that you get the desired XML output from your application.

  Next Page: Making a Word 2003 Template
Page 1: IntroductionPage 3: Creating the Output
Page 2: Making a Word 2003 Template 
Please rate this item (5=best)
 1  2  3  4  5
© Copyright Component Developer Magazine and EPS Software Corp., 2009
advertisement
Advertising Info  |   Member Services  |   Permissions  |   Contact Us  |   Help  |   Feedback  |   Site Map  |   Network Map  |   About

internet.commediabistro.comJusttechjobs.comGraphics.com

Search:

WebMediaBrands Corporate Info

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | Shopping | E-mail Offers | Freelance Jobs