nstructured data is that information hidden in a company’s e-mails, memos, notes from call centers and support operations, news releases, user groups, chats, reports, letters, surveys, white papers, marketing material, research, presentations, and Web pages. Businesses have long realized that reusing this data could be a substantial competitive advantage. But it wasn’t until recently, with the advent of XML, that we had a solution.
Vendors such as Sonic Software, Altova, and Microsoft have all developed technology that transforms unstructured data into XML; once this data is in XML, the sky’s the limit.
Much has already been made of the XML capabilities coming in Microsoft’s Office 11. But while MS Office is geared toward content creators and business users, Altova Marketing Director Larry Kim says the latest release of xmlspy does two things MS Office doesn’t. First, it transforms MS Word, PDF, and HTML files into XML format (Office 11 only transforms Word docs). Second, it delivers the ability to do something new with your transformed XML content.
In an example of this kind of supertransformation, last month, CambridgeDocs released a new plug-in for xmlspy called xDoc, which allows users of the CambridgeDocs XML Content Backbone to migrate unstructured data into that system easily.
xmlspy users can select “Open From Microsoft Word”, “Open From HTML”, “Open From PDF,” or any other supported format, which converts the content into XML automatically, along with an appropriate XSLT stylesheet required to view the XML as a well-formatted document. The XML can then be manipulated, transformed using XSLT, or published in various formats, either programmatically or interactively, depending on customer need. This allows developers to exploit new uses of existing information.
xmlspy is also capable of taking XML output from Office 11 and transforming that output into DocBook, HRXML, RIXML, IRXML, FPML, DAS-XML, NewsML, and many other custom XML schema/DTDs.
But xmlspy can do more than just transform XML into XML language derivatives. As Kim says, “once business content is stored in XML format, you typically want to do something useful with [that] content.”
For example, you might want to transform (i.e. publish) an XML document into HTML, WML, PDF, or PostScript. Or you might want to save an XML document into an XML repository or an XML-enabled relational database. Or you can syndicate your XML content via Web services. xmlpsy 5 includes the XML tools, such as a schema editor and XSLT debugger, to do those additional kinds of transformations.
Perhaps one of xmlspy’s most useful features for programmers is its ability to transform existing HTML docs into XML. Users can decompose existing HTML pages into stylesheets, XML Schemas, and XML content. The process enables Web developers to convert an existing Website into an XML-enabled site.
xmlspy also offers help for file transformations using XSLT. “I think that Web developers are actually having the hardest time learning XSLT and trying to make XML-based Web sites?not because it is harder then any other type of transformation per se, but because their skill set is not as ‘technical’ as say, a software engineer.” Tools like the xmlspy XSLT debugger and XPath Analyzer are helping Web developers “get the hang of it,” reports Kim.
Kim anticipates that programmers will need apps like xmlspy in order to work with XML documents throughout the XML content lifecycle.