Browse DevX
Sign up for e-mail newsletters from DevX


Working with Microsoft Office Word 2003's XML : Page 2

One of Microsoft Office 2003's most significant new features is the integration of XML technology. This article focuses on taking advantage of Word 2003's XML features from within your applications.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Making a Word 2003 Template
With the schema out of the way, let's see how to apply it to a Word 2003 document. Start by creating or opening a document in Word 2003 with the desired boilerplate text. You may wish to highlight or somehow mark the locations for XML placeholders in your document so you can find them easily when it comes time to edit the document. My convention is to write the node names into the text of the document, and surround them with square brackets (e.g., [ContactName]). These become the placeholders for the schema elements in the document.

Because these are XML documents, you can pass them over the Internet from a Web service or Web site to a client.
To apply a schema, open the Tools menu and select the Templates and Add-ins... option. This opens the dialog box where you can manage the XML schemas that can be applied to Word 2003 documents. Select the XML Schema page to view the current list of attached schemas. If the list is blank, or the desired schema is not listed, click the Add Schema... button. After adding a schema, you are prompted to provide an alias for the schema, simply to make it easier to reference because the namespace is usually long and difficult to read. Once you've added your schema and provided the alias, it appears in the list on the XML Schema page. Enable the checkbox next to the desired schema, and then close the dialog box.

Once you press the OK button on the Templates dialog box, Word 2003 automatically displays the XML Structure task pane. If it doesn't, you can press Ctrl+F1 to make the task pane appear, and then select the XML Structure page from the drop-down list at the top of the pane.

Now that a schema has been attached, you can apply the elements from the schema to the document. Depending upon how your schema is constructed, you may or may not see any elements in the lower part of the XML Structure task pane. In the example schema, because there is no parent element, all of the nodes initially appear in the list.

Figure 1: A Word 2003 document that has been marked up with the schema from Listing 1 looks like this.
To apply the elements, select an area of your document (it doesn't have to contain any text) and then choose one of the available elements to apply. When selecting the first element to apply, Word 2003 prompts you to define how you wish to apply this first element, either to the entire document or only to what you have selected. I've gotten into the habit of always applying the elements to the selection, as that seems to be what I'd want in most situations anyway. Continue to highlight text and apply the elements as desired.

After making your selections and applying the schema, you may or may not see much of a difference in your document. This depends on whether or not you have selected the Show XML tags in the document checkbox in the XML Structure task pane. With this option selected, you'll see the start and end tags graphically represented in your Word 2003 document, as shown in Figure 1.

Now that you've applied the schema to your document, save it as an XML file so that you can parse it with your application code. To do this, start by choosing the Save As... option from the File menu. In the Save As Type drop-down list, choose XML Document (*.xml). You will then see some additional controls to the right of the drop-down list that are specific to the XML format, as shown in Figure 2.

None of the checkboxes should be selected for this example, as you do not want to apply a transform or save only the data without the tags. This ensures that all of the information you have entered into your document is written out to XML.

Figure 2: Choose the XML format from the lower portion of the Save As... dialog box showing the XML options.
Tips for Saving as XML
To make things a little cleaner in the XML output, you will want to ensure that you either spell everything correctly (not very likely if you use my naming convention for the placeholder text) or that you ignore any spelling errors flagged by Word 2003. If you leave in something that the Word 2003 spelling checker doesn't like, the resultant XML looks similar to the following snippet:

<ns0:ContactName> <w:p> <w:r> <w:t>[</w:t> </w:r> <w:proofErr w:type="spellStart" /> <w:r> <w:t>ConyactName</w:t> </w:r> <w:proofErr w:type="spellEnd" /> <w:r> <w:t>]</w:t> </w:r> </w:p> </ns0:ContactName>

As you can see, with the proofing errors, this changes the expected XML, because Word 2003 has embedded some proofErr elements. Once you handle the spelling errors (e.g., right-click the error in the document and choose "Ignore All"), the XML appears as shown in this snippet:

<ns0:ContactName> <w:p> <w:r> <w:t>[ContactName]</w:t> </w:r> </w:p> </ns0:ContactName>

Also, be aware of where your paragraph marks appear in relation to your applied schema elements. In the snippet shown above, the [ContactName] text appears on a line all by itself. This places a paragraph element (the w:p element) completely within the ContactName element.

If, on the other hand, you placed ContactName on the same line as some other text or another element, the paragraph element won't appear within the ContactName element but outside of it. Because my document contains both of these examples, the code will have to handle both situations appropriately.

Opening the XML File
Now that you've saved the document as XML, you can see the document on your hard drive with its XML extension. When you double-click it, it opens up within Word 2003, not in your associated program for XML files (which is, by default, Microsoft Internet Explorer). This is because there is a processing instruction at the top of the XML document that declares the ProgID to use when opening this XML file, as shown in this snippet:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <?mso-application progid="Word.Document"?> <w:wordDocument . . .

If you comment out the second line of this document and then save it, you no longer launch Word 2003 when double-clicking the XML file. I found this useful during testing so that I could quickly view the XML produced by saving the Word 2003 document as XML.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date