Featured Discussion: Designing Mixed-element Schema

ecently, “Paul” wrote to the xml.general group asking:

I have the following XML element:

   Hello John, my name    is Paul. I would    like to tell you something.

The element can contain any number of and elements in any order?but I cannot seem to determine how to construct the element within the Schema file. I currently have in the schema something like this:

                  

The element allows for any ordering, but only allows to have a max of 1 and 1 element. I need to have an unbounded number of and elements in any order.

Diagnosis
There are two problems here. First, the and elements appear in any order, and second, they appear some unforeseeable number of times within the element. Fortunately, XML schema aren’t limited to sequences or set lists; you can create mixed content elements just as easily.

When you’re working with regular structured data, such as data extracted from a relational database table, it’s relatively easy to see how you can write an XML schema, to describe the data, because:

  • The database defines the data types for each column
  • Any row holds one and only one data value for each column.

For example, here’s a simple customer record with three columns:

CustomerID CustomerLName CustomerFName
25 Foo John

A schema definition for this might look like:

                                                

The element defines a sequence of sub-elements that must appear in the defined order. The preceding schema would validate the following XML file:

   10   Doe   John

Working It Through
Sometimes, you need to ensure that a set of elements appears, but you don’t care about the order. For example, does it really matter if the CustomerID element always appears first in a document? If you’re simply scanning through an XML file, perhaps reading all elements into a Customer class, you would key off the element name, creating a new Customer instance each time you encounter a element, setting its ID whenever you encounter a tag, etc. As long as the tags contain the correct data, it makes little difference whether you set the Customer object’s ID property before setting its LastName property; any sequence of ID, LastName, and FirstName tags works equally well.

In that case, you can use the element, which lets a list of sub-elements appear in a document in any order.

                 

Now, you can validate either the customer XML document shown earlier or a customer XML document with a different sub-element sequence, for example:

   Doe   John   10

However, when the data in XML documents is less structured, as in Paul’s question, the schema structure is less clear. One answer received from Anthony Jones states that you can solve the problem of the unknown number of child elements by adding a maxOccurs attribute with a value of “unbounded.” Unbounded means an element may appear any number of times.

But that still doesn’t solve the problem; as Paul says, the element allows for a set of unsequenced elements, but only one of each element may occur in the set. “John” provides a more complete schema that also uses the maxOccurs=”unbounded” attribute.

                   

But, that’s not a complete solution, because the element doesn’t allow maxOccurs to be unbounded, it restricts the value of both the minOccurs and maxOccurs attributes to either 0 or 1.

A Working Solution
Rather than xs:all, use xs:choice, with the minOccurs attribute set to 0 and the maxOccurs attribute set to unbounded. Here’s the fixed schema.

                                        

The preceding schema lets the and sub-elements occur zero or more times, in any order. The solution is not intuitive because the element implies that the validator will choose between alternatives, but in this case, it simply chooses all the child and elements, letting the document validate. There have been several calls for alterations to the element in XML Schema 1.1, adding support for the maxOccurs=”unbounded” attribute value, but that version isn’t available yet.

Schema such as this are critical when you need to be able to validate unstructured data, where it’s impossible to know the order or number of elements in advance.

Go the the xml.general group now to participate or ask your own question.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

Recent Articles: