Browse DevX
Sign up for e-mail newsletters from DevX


Export Customized XML from Microsoft Word with VB.NET : Page 5

Learn to use Word automation from .NET to turn hard-to-process Word documents into customizable XML




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Mapping Style Names to Element Names
Each Word paragraph object has a Style property that returns a Style object. So as you iterate through the paragraphs, you want to obtain the Style object and retrieve its name. It turns out that Style objects don't have a Name property, they have a NameLocal property instead, which corresponds to the name you see when you select a style from Word's dropdown style list—and that's exactly what you need. Because the paragraph returns an Object, you must cast it to a Word.Style object to use the NameLocal property in your code.

Dim stylename As String = CType(p.Style, _ Word.Style).NameLocal

Now that you have the style name for this paragraph, you want to map it to an XML element. There are two considerations. First, the Word style names can contain spaces, while XML element names cannot; therefore, you must either remove or replace the spaces before applying the name to an XML element.

Second, you may not want to map the Word style names directly to XML element names. For example, you might want to map Word's Normal style to a <p> element in the XML document. To do that, you need to write a bit of lookup code to map style names to element names. The sample application contains a StyleMapping class that performs the lookup (see Listing 3). For convenience, the StyleMapping class also contains a fixupName method that handles replacing any spaces in the Word style name with underscores.

To instantiate an instance of the StyleMapping class, pass it the name of an XML-formatted map file. Map files consist of a root <mapping> tag, which contains any number of <item> tags. Each <item> tag has style and tag attributes that hold the Word style name and the corresponding name of the XML tag that will hold a paragraph of that style.

<?xml version="1.0" encoding="utf-8" ?> <mapping> <item style="Heading 1" tag="h1"></item> <item style="Normal" tag="p"></item> </mapping>

For example, the preceding map file instructs the application to map the Word style "Heading 1" to an <h1> element and to map the Normal style to a <p> element.

As written, the application always attempts to look up the style name for every paragraph by calling the StyleMapping.GetStyleToElementMapping method. If that method finds an <item> element with a matching style attribute, it returns the value of the tag attribute; otherwise it "fixes up" the Word style name by calling the private fixupName method and returns the result.

' definition in docToXml method Dim styleMapper As New StyleMapping( _ Application.StartupPath & "\stylemapping.xml") ' for each paragraph, map the Word style to ' and XML element name Dim elementName As String = _ styleMapper.GetStyleToElementMapping(stylename) ' In the StyleMapping class Public Function GetStyleToElementMapping( _ ByVal aStylename As String) As String Dim el As XmlElement = getMapNode(aStylename) Dim tagname As String = String.Empty If Not el Is Nothing Then If el.HasAttribute("tag") Then tagname = el.GetAttribute("tag") End If End If If tagname = String.Empty Then tagname = fixupName(aStylename) End If Return tagname End Function Private Function getMapNode( _ ByVal aStylename As String) As XmlElement Dim n As XmlNode = _ xml.SelectSingleNode("//item[@style='" + _ aStylename + "']") If Not n Is Nothing Then Return CType(n, XmlElement) Else Return Nothing End If End Function Private Function fixupName(ByVal aStylename _ As String) As String Return aStylename.Replace(" "c, "_"c) End Function

After obtaining a mapped name, you can create a new XmlElement and append it to the most recent page element.

Dim N As XmlElement = _ xmlDoc.CreateElement(elementName) N.InnerText = s pageNode.AppendChild(N)

When the docToXml function has processed all the paragraphs, it returns the completed XML document. The Process button Click event handler code then displays it in the multi-line TextBox (see Figure 3).

Figure 3: The Completed Transformation. After processing, the simple sample.doc file, the multi-line TextBox displays the content transformed to XML.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date