Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Convert Schemas to Documents

You can build a new XML object based upon its schema and go the other way, taking an object and building a schema from it.


he way that XML is being used, of late, is beginning to send tremors throughout much of the programming world. Part of this has to do with an obvious logical conclusion that can be derived from the structured nature of the language. First, you can represent almost any structured entity as XML code. Also, a programmatic class is a structured entity. Hence, you can represent a programmatic class as XML code.

Of course, that simple bit of logic covers a vast amount of implementation details, but in the main this is true. If you know the structure of an object (through its "schema"), then there is no reason that you couldn't create an "instance" of that object from the schema. In that sense, the schema takes on the same role that a class constructor does in procedural languages—it defines the principle elements of the structure, possibly including the passing in of parameters to more fully flesh out given elements in the XML structure.

While this process will become easier with the introduction of archetypes in the full XML Schema, the Schema specification is still very much in flux, precisely because it does serve such a critical role. The Microsoft XML Technology Preview parser, released in late January, currently supports the older Schema specification (the Reduced Data Set specification covered in my previous 10-Minute Solution, "Generating XML from ADO Recordsets"), though the company has specifically pledged to provide an updated implementation shortly after the specification itself reaches recommended status.

As a consequence, the examples that I'll show work with the older Schema. However, I'm also using the compliant W3C XSL-T Parser (the aforementioned XML Technology Preview, available at http://msdn.Microsoft.com/xml) in order to take advantage of such technologies as parameterization.

Objects, Collections, and Properties
Before jumping headfirst into XML code, I'd like to define a few bits of terminology. Specifically, I want to make the distinction between an object, a collection, a property, and a default property.

An object in XML terms could best be described as any entity that has an associated unique identifier. Put another way, an object is something that can be uniquely identified by name. For example, an employee schema could describe any number of different kinds of employees, but each such employee would have one element, or more likely attribute (typically an ID), that designates that element as being unique.

A collection is a set of zero or more structurally similar objects. 'Structurally similar' is a little harder to define, although in general, it means that it is derived from the same XML schema. Thus, while the internal elements for a marketing and programming employee may differ, they both would be produced from an employee schema.

A property is an element or attribute within an object that describes the object in some fashion (you could make a valid argument about whether an ID is a property or not...I personally don't believe it is, but the reasons for that opinion exceed the scope of this article). In most cases, properties appear, at most, once within a given object.

A default property is an element or attribute that can occur either zero or one times within an object. It is a type of property, and generally represents a default case for the object.

Note that these differences are not mutually exclusive. In the simplest case, a collection contains zero or more objects, which are each, in turn, made up of zero or more properties. This is usually characteristic of database output, in which recordsets contain zero or more records, each of which are made up of one or more (typically unique) fields.

On the other hand, an object could simultaneously be a collection with both unique properties and sub-objects contained in the scope of the object itself. For example, a company division might contain division properties (the name of the division, its manager, current hiring status, and so forth) and employee objects all as immediate children of the division node in an XML tree. Typically, this situation is equivalent to having an <employees> collection as a property of the object holding the <employee> nodes, rather than the division acting as the container—in other words, making the implicit employees collection explicit. This also helps to keep the code more generic, as you can iterate through the collection's contained objects using a wildcard in an XSL Path.

These definitions are important because they help you to determine the characteristic objects that you can create from the schema. Consider if you have an object that includes a collection of other sub-objects—how many sub-objects should be created? If you create a company division, as an example, you almost certainly don't want that division to create a blank employee—rather, you'll want the employees to be added separately through a different mechanism. Thus, in general, when creating an object that contains a collection of other objects, the instantiating program shouldn't create the sub-objects at the same time.

Similarly, consider the problem of default properties. In a few instances, you can create an element or attribute that is considered implicit—if the XML document doesn't contain the property, then the element or attribute is assumed to have an already-defined default value. For example, you may have a default set up for an employee that assumes his or her vested attribute is set to "false". However, these defaults exist for convenience sake—to cut down the size of files or reduce the amount of typing, among other things—but their values should be explicitly included for any generated instance because this information can drive subsequent code.

The code to do this in XSL is surprisingly compact, and is given in Listing 1, MakeInstanceFromSchema.xsl. A schema can contain the definitions of any number of different object "classes", so an XML parameter called objectName is provided to let you set the object that you want to retrieve.

By default, through the fairly complex expression:

<xsl:param name="objectName" 

the parameter will retrieve the name of the last ElementType node in the schema, which is also the definition for the root node of the resulting object. You use this name, in turn, to retrieve the first element. If this process seems somewhat circular, it is—for the default case. However, if you replace the selection with a string expression (such as 'employee' for a list of employees within an <employees> node), then the resulting tree starts from that point in the schema instead.

Thus, consider the employees schema given in Listing 2, EmployeesSchema.xml. The last elementType defined is the one for the root node <employees>. If the parameter is not changed, then it is this node that is converted into an object (in this case, the empty collection <employees/>, because the employees collection does not have any innate properties).

Immediately before that, the <employee> node is defined, which consists of a number of properties, including one default property <canHire>. If you change the objectName parameter's definition to (note the string within a string):

<xsl:param name="objectName" select="'employee'"/>

then the XSL function, when passed the schema, will produce an instance of the "employee" object instead:

<employee id="">

Notice that what is produced here is, with the exception of the canHire element, a completely blank record. The schema itself doesn't (and shouldn't) contain any parametric information for loading the newly created objects (although see my discussion of parameterization later in this article).

Most of the real action in producing this output comes from the ElementType template in the XSL document (see Listing 3). The expression <xsl:element name="{@name}"> creates a new element with the name given by the ElementType's name attribute. Once the element is created, attributes are added by iterating through each <attribute> child in <ElementType> defining a variable that contains the <attribute>'s type attribute, then using that name to create a new attribute for the just created element with that name. Finally, if the attribute has a specific predefined default value (contained in the dt:default attribute), then the newly minted attribute's value is set to this.

Creating child elements is a little more complicated—specifically, you apply two tests. The first test checks to see if the element tag has a minoccurs attribute (the number of times the element occurs in the parent element) greater than 0. If it does, this element is automatically assumed to be a property and is automatically included. This expression retrieves the corresponding element name, then recursively applies the same template to the child:

<xsl:variable name="elementName" select="@type"/>
<xsl:apply-templates select="//ElementType[@name=$elementName]"/>

On the other hand, if the minoccurs attribute is zero, things get a little more problematic. There are two possibilities here—if maxoccurs is one, then the element is a default property—if it hadn't been listed, then you would assume it had a certain value (or a certain set of sub-nodes). On the other hand, if maxoccurs is unlimited (has a value of "*"), then you have an object instead. Previously, I said that an object has a unique ID (which could certainly be tested for in its stead). But more accurately, an object simply has one characteristic that defines it as being unique, and there's no guarantee in the schema that an explicit ID is given—although technically it should be. Thus, the test for an object comes down to a test to see if an element has a minoccurs of "0" and a maxoccurs of "*".

Take Advantage of Parameterization
Unfortunately, instantiating objects from a schema looks very much like the problem of creating objects from Visual Basic—there are no explicit constructors for loading the newly created instance with data. The instanceData parameter serves this role. This parameter should hold a single element with one attribute corresponding to each of the properties that you want to fill.

For example, suppose that you wanted to create a new record based upon an ADO database record that returns each property to be filled as an attribute. The element itself might look something like:

<record id="empl12533" firstName="Kurt" lastName="Cagle" 
title="Author" salary="65536"/>

This record is incomplete to illustrate that you don't need all the parameters to instantiate the object. You could use the following JavaScript code to insert this element into the instanceData parameter, where recNode contains the indicated data record, makeInstanceFromSchema contains the XSL object, and employeesSchema contains the schema XML document:

Set paramNode=makeInstanceFromSchema(
else {
    paramNode.appendChild recNode
set objectNameNode=makeInstanceFromSchema(
objectNameNode.setAttribute "select","employee"

The employee object now contains the XML document:

<employee id="empl12533">

This requires knowing the names of the fields (and you could probably argue that if you knew this information, you could recreate the object from a template in the first place), but the advantage of going with this approach becomes much more evident when dealing with larger (and deeper) schemas.

Convert XML Documents to Schema
A related problem to creating an object from a schema is building a schema from an object. This task may seem a little odd at first glance, but the principal reason for wanting to create a schema from an object is to generate schemas without having to do a lot of work. In general, you know the XML document that you want the schema to represent, but building the full schema can be a pain. Note that the results of this operation likely won't make a finished product—there are many relationships that can't be deduced from a document with regard to its schema—but it's useful for creating a skeleton.

The code in Listing 4, BuildSchema.xsl, generates the schema element. Because a generated schema element can raise errors if given as a single element, you break the element up into a set of xsl:attribute declarations with a default name of Undefined.xml and schema declarations for the datatype and schema namespaces.

The code then searches for the root node. Here's where things get interesting. Because schemas contain their elements from the end of the document forward (relative to the order in which the tree is walked), you need to generate the actual XML structure in reverse. To do this, for each element, the attributes for that node are processed first, then the child elements of the current element, and finally the node itself. This form of recursion is especially useful for creating end-forward structures such as schemas.

One effect of this is to place all of the AttributeType tags at the beginning of the document, although they appear in the order that they're encountered. Subordinate <attribute> tags can then always refer to these objects defined previously in the document. This is true of <element> tags being able to refer to previously defined <ElementType> tags as well.

While the code is pretty straightforward (especially when compared to the MakeInstanceFromSchema XSL document), I did want to highlight the <description> element for a moment. This tag is part of the XML Schema specification, and has the added advantage of being able to contain XML code within it that's not part of the default specification. In this case, I created a <title> and <body> tag which can be used to provide some helpful information to applications (or tables output to HTML).

The source XML file to be transformed into a schema should strive for simplicity—for collections, include only one instance of each object being collected (in other words, to generate the schema from a selection of <employee> objects within an <employees> node, include only one employee). The conversion routine isn't smart enough to recognize multiple instances and act accordingly, although that is not a complicated exercise to handle.

Finally, remember that the schema specified here is the Microsoft Reduced Data Schema, not the final W3C Schema. I'm hoping to create more robust converters for that once the Schema specification itself is finalized.

One of the real strengths of XML is that it is essentially self-describing in the dialect that it is written. This may seem obvious, but consider that with procedural languages, creating a class from an object is well nigh impossible without very specialized (and complex) tools. When the new Schema spec does become real, it will include other elements such as data type definition and archetypes that will push the object-oriented nature of XML even further, making it a substrate of a new and powerful computing language.

Kurt Cagle is the author or co-author of twelve books and several dozen articles on web technologies, XML and web services. He is the president of Cagle Communications (Olympia, WA), which specializes in the production of training materials for XML and Web Services education. He can be reached at kurt@kurtcagle.net.
Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.