n an ideal world, the process of communicating between XML and the database world would be completely seamless and transparentyou point an XML DOM at a database, say Update or Request, and you're done. In this all-too-messy real world where the "standards" change from day to day and no two database companies would dare agree on a common format for fear that it may give their competitor a marginal advantage, this utopian vision is not quite here yet.
However, that's not to say that there hasn't been progress in the XML to RDBMS (relational database management system) front. At the XML One Fall '99 conference in Santa Clara, Oracle debuted their superb Oracle 8i update, which provided a number of XML features. Microsoft's SQL Server 7 also incorporated a number of basic XML features into their Active Data Objects (ADO) 2.5 engine, which also provides some fixes for basic limitations in the ADO 2.1 provider.
The XML format that the Microsoft ADO 2.5 (and ADO 2.1) engine produces is written in Microsoft XML-Data Schema format, which is also known as the
XML Reduced Data Schema, or simply Reduced Data. The XML Reduced Data Schema works by specifying the datatypes and similar characteristics of the schema (default values, primary key information, and so forth) from the database and placing this information into the first half of the document. A similar process then extracts the data to place in XML row nodes. By separating these, you can cut down on the amount of spurious information that the XML document contains while still being able to get relevant datatype information.
For one recent project, I needed to get information about role-playing characters for an online Web game. I stored the characters table in an Access database (characters.mdb), then used ADO 2.5 to retrieve a general recordset query (see Listing 1).
The conversion from database to XML is done with the function rs.save(). Save basically takes the recordset and converts it into a stream format. If the first argument given is a string containing a URL, then the function save() outputs the data into its intrinsic binary format. However, by passing the adPersistXML flag, the stream is converted into an XML stream instead.
In ADO 2.1, you were forced to output the stream to a file, which proved a significant performance hit. The stream had to be converted into a Unicode formatted text string, spooled out to the hard drive through a standard file interface, then if you needed the resultant XML, the file had to be reloaded and reparsed back into XML stream. ADO 2.5 lets you write the result directly out to an XML DOM Document, bypassing both conversion and parsing, and making for much smoother code. You can see the output for the data provided in the characters.mdb file in Listing 2.
A Trip Through Namespace
There are four explicit namespaces defined in the Microsoft XML Reduced Data Schema:
- The XML Reduced Data Schema namespace "s", which contains the specific information about the structure of the internal XML format used by each rowset. (xmlns:s="uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882" )
- The Data Type Specification namespace "dt", which points to Microsoft's internal data typing mechanism.(xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" )
- The Rowset namespace "rs", which contains information about the database implementation for each row. (xmlns:rs="urn:schemas-microsoft-com:rowset" )
- The RowsetSchema namespace "z", which contains the actual data from the database. This utilizes the schema defined by the "s" namespace. (xmlns:z="#RowsetSchema")
This set of namespaces actually provides some interesting usage information. Both the Microsoft XML Reduced Data Schema namespace and the Data Type Specification namespace actually point to GUIDs (globally unique identifiers), which uniquely identify them to the system. However, these GUIDs (actually UUIDsuniversally unique IDswhich is the preferred term) don't explicitly point to any COM objects, but instead are used as identifying tags by the XML parser to map internally to the proper objects. By doing this, the parser treats the namespace UUID as a handle, or a pointer to a pointer, making it easier to upgrade the respective parsers in the future.
The Rowset namespace, on the other hand, points to a more traditional flavor of namespace: a URN (Uniform Resource Name) that the XML parser identifies with the Reduced Data rowset. The URN uses notation that may be less familiar if you are used to working with the URL notation of the Web, but urn:schemas-microsoft-com:rowset is more or less the same as the URL http://schemas.microsoft.com/rowset.
The final namespace, RowsetSchema or "z" namespace, retrieves the schema identified by the ID "RowsetSchema" earlier in the document. This schema is local, created on the fly by the ADO package. Because a typical database recordset will change based upon the SQL source string, this schema will change as well. Thus, you could control the way that the schema is generated by being more specific with your query. For example, this code will output just these four elements into the schema, rather than the entire table:
Rs.open "SELECT characterName, vocation, species, gender FROM Characters"
The Microsoft XML Reduced Data Schema format intrinsically associates datatypes to each of the attributes. Thus, if you make a request using the getAttribute method or nodeTypedValue property of the XML DOM, the value returned will be of the correct datatype. For example, to retrieve the strength attribute of the Elf Sheinara, you'd use these XML calls:
In either case, the value is returned as 0.96, which is the four digit approximation (with trailing zeroes not explicitly displayed) of the internal format "0.95999999999999996". Put another way, these two elements return a floating point number, not a string.
Converting to an Element Schema
Still, the z:row format has some serious limitations to it. In general, while attributes process a little faster than elements (they have their own specialized interface, and as such can be optimized for performance, whereas the more general elements cannot), they may not necessarily be in the most preferred format for your own use. For example, suppose that you wanted to turn the attributes into a more traditional element-based XML structure that looks like Listing 3.
You can perform the conversion in a specific case (for instance, with known container and object names, such as characters and character) with an XSL script as shown in Listing 4. One of the primary problems with such a script is that you would need to explicitly change the source code for the script every time you wanted to run it to take into account different object or group names. While you can parameterize the output, this undertaking is not trivial.
However, this is one of those places where the DOM is actually the more efficient way of handling the problem. The convertToElementTree function actually emulates the functionality of the XSL script, but because you can pass in the relevant group and object names as parameters into a function, it is both more concise and easier to manage (see Listing 5). It takes as an argument a Reduced Data XML structure, the name of the group, and the name of each object in the group, then returns a DOM object in the element-only format.
To generate the sample output shown in Listing 3, you would need to call the convertToElementTree function as follows:
Dim charactersDoc as DOMDocument
Towards a Schema Standard?
I should note that the beta version of Windows SQL Server 7.5 actually generates such a format by default (although you may need to change the containing elements), at least when called from a Web address. Currently, only Microsoft and its partners support the XML Reduced Data Schema format, and in all likelihood it will end up being principally an internal data format, especially when the W3C XML Schema becomes a formal recommendation.
Microsoft originally designed the XML Reduced Data Schema as a proposal for identifying and handling data in the XML space, and it was an instrumental document for creating the W3C XML Schema. However, because the latter schema is an open standard, it has been tweaked and changed to better reflect the needs and requirements of other database vendors such as Oracle or IBM. The Schema standard proposal itself is divided up into two "Working Draft" documents: The W3C XML Schema Structures document (www.w3.org/TR/xmlschema-1/) and the W3C XML Schema Data-type document (www.w3.org/TR/xmlschema-2/).
The first document specifies the Schema architecture, and is structurally similar to the Microsoft XML Reduced Data Schema discussed in this articlethe primary differences have to do with the addition of archetypes to allow for the creation of custom datatypes.
The second document, on the other hand, focuses on the datatypes themselves, including ways of creating and inheriting datatype specifications. While many of these are similar to the original Microsoft datatypes (such as datatypes for integers, real numbers, dates, and similar "primitive" types), the names or scoped ranges may have changed since the Microsoft XML Reduced Data Schema was first proposed.
The W3C Schema specifications are currently considered Working Drafts (and should be entering Last Call shortly), although they will likely become Proposed Recommendations to the XML canon by early in 2000. For those unfamiliar with the terminology involved, a Working Draft is a document that is currently in development, much like a legislative bill in the United States Congress. Last Call indicates that the participating members of the W3C can include minor amendments or clarifications to the document, but cannot substantively change the document. If it passes when the W3C members vote on the draft, it is considered a Proposed Recommendation and is immutable at that point; otherwise, it goes back to being a Working Draft.
Finally, the Proposed Recommendation becomes a formal Recommendation after a final vote, pending evidence that would prove that the standard is unworkable. A Recommendation is the closest thing to a law in the W3C. The document is considered stable and can be coded against, and any changes that need to be made to it would instead be considered part of the next version.
Recently, the XPath and XSL-Transform documents became formal Recommendations; the XHTML and XML DOM 2 specifications are now Proposed Recommendations; and MathML, XLink, and SVG are all in Last Call. Once the Schema specifications become Recommendations, then all of the critical pieces for XML development will be stable, which should have a profound impact on companies wishing to deploy XML solutions.
As a final note, the W3C also includes the Note designation for a document. This is a document that may present one member's suggestions for implementing a certain technology, or may simply be a useful document for clarifying the needs and requirements for an XML technology. The Microsoft XML Reduced Data Schema proposal is just such a Note, and can be retrieved from the W3 site as well (www.w3.org/TR/1998/NOTE-XML-data-0105/). Check www.w3.org/TR for the status of all such Notes, Working Drafts, and Recommendations.
The XML Reduced Data Schema notation will probably be with Microsoft for a while, even after the W3C XML Schema documents become formal Recommendations. They form the underpinnings of much of the current data handling strategy for inter-Microsoft applications, and will probably have a big part to play in the upcoming Biztalk Server. With XSL or DOM code you can easily transform the data into other formats, and can even (with more effort than I can show in this article) modify the internal schema data representations into a variety of different schema configurations. This ability to transform both the data and the underlying schema of that data will figure large in your XML work in the future.
Click here to download the code and the sample Access database.