Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Featured Discussion: Designing Mixed-element Schema

A reader asks for—and receives—an XML schema that can support an unbounded number of <bold> and <italics> elements in any order. It's more complex than it seems.




Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js

ecently, "Paul" wrote to the xml.general group asking:
I have the following XML element:

<text> <bold>Hello</bold> John, my <italics>name</italics> is <bold>Paul</bold>. I would <italics>like<italics> to tell you something. </text>

The <text> element can contain any number of <bold> and <italics> elements in any order—but I cannot seem to determine how to construct the <text> element within the Schema file. I currently have in the schema something like this:

<xs:complexType mixed="true" name="textType"> <xs:all> <xs:element name="bold" type="xs:string" minOccurs="0"/> <xs:element name="italics" type="xs:string" minOccurs="0"/> </xs:all> </xs:complexType>

The <xs:all> element allows for any ordering, but only allows to have a max of 1 <bold> and 1 <italics> element. I need to have an unbounded number of <bold> and <italics> elements in any order.

There are two problems here. First, the <bold> and <italics> elements appear in any order, and second, they appear some unforeseeable number of times within the <textType> element. Fortunately, XML schema aren't limited to sequences or set lists; you can create mixed content elements just as easily.

When you're working with regular structured data, such as data extracted from a relational database table, it's relatively easy to see how you can write an XML schema, to describe the data, because:

  • The database defines the data types for each column
  • Any row holds one and only one data value for each column.
For example, here's a simple customer record with three columns:

<table> <tr> <td><b>CustomerID</b></td> <td><b>CustomerLName</b></td> <td><b>CustomerFName</b></td> </tr> <tr> <td>25</td> <td>Foo</td> <td>John</td> </tr> </table><br>

A schema definition for this might look like:

<xs:element name="customer"> <xs:complexType> <xs:sequence> <xs:element name="CustomerID" type="xs:string" minOccurs="0" /> <xs:element name="CustomerLName" type="xs:string" minOccurs="0" /> <xs:element name="CustomerFName" type="xs:string" minOccurs="0" /> </xs:sequence> </xs:complexType> </xs:element>

The <xs:sequence> element defines a sequence of sub-elements that must appear in the defined order. The preceding schema would validate the following XML file:

<?xml version="1.0"?> <customer> <CustomerID>10</CustomerID> <CustomerLName>Doe</CustomerLName> <CustomerFName>John</CustomerFName> </customer>

Working It Through
Sometimes, you need to ensure that a set of elements appears, but you don't care about the order. For example, does it really matter if the CustomerID element always appears first in a document? If you're simply scanning through an XML file, perhaps reading all <customer> elements into a Customer class, you would key off the element name, creating a new Customer instance each time you encounter a <customer> element, setting its ID whenever you encounter a <CustomerID> tag, etc. As long as the tags contain the correct data, it makes little difference whether you set the Customer object's ID property before setting its LastName property; any sequence of ID, LastName, and FirstName tags works equally well.

In that case, you can use the <xs:all> element, which lets a list of sub-elements appear in a document in any order.

<xs:complexType name="customer"> <xs:all> <xs:element name="CustomerID" type="xs:integer" minOccurs="1" maxOccurs="1" /> <xs:element name="CustomerLName" type="xs:string" minOccurs="1" maxOccurs="1" /> <xs:element name="CustomerFName" type="xs:string" minOccurs="1" maxOccurs="1" /> </xs:all> </xs:complextType>

Now, you can validate either the customer XML document shown earlier or a customer XML document with a different sub-element sequence, for example:

<?xml version="1.0"?> <customer> <CustomerLName>Doe</CustomerLName> <CustomerFName>John</CustomerFName> <CustomerID>10</CustomerID> </customer>

However, when the data in XML documents is less structured, as in Paul's question, the schema structure is less clear. One answer received from Anthony Jones states that you can solve the problem of the unknown number of child elements by adding a maxOccurs attribute with a value of "unbounded." Unbounded means an element may appear any number of times.

But that still doesn't solve the problem; as Paul says, the <xs:all> element allows for a set of unsequenced elements, but only one of each element may occur in the set. "John" provides a more complete schema that also uses the maxOccurs="unbounded" attribute.

<xs:complexType mixed="true" name="textType"> <xs:all maxOccurs="unbounded"> <xs:element name="bold" type="xs:string" minOccurs="0"/> <xs:element name="italics" type="xs:string" minOccurs="0"/> </xs:all> </xs:complexType>

But, that's not a complete solution, because the <xs:all> element doesn't allow maxOccurs to be unbounded, it restricts the value of both the minOccurs and maxOccurs attributes to either 0 or 1.

A Working Solution
Rather than xs:all, use xs:choice, with the minOccurs attribute set to 0 and the maxOccurs attribute set to unbounded. Here's the fixed schema.

<?xml version="1.0"?> <xs:schema id="text" targetNamespace="http://www.devx.com/TextMixed.xsd" xmlns="http://www,devx.com/TextMixed.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema" attributeFormDefault="qualified" elementFormDefault="qualified"> <xs:element name="text"> <xs:complexType mixed="true"> <xs:choice maxOccurs="unbounded"> <xs:element name="bold" nillable="true" type="xs:string" /> <xs:element name="italics" nillable="true" type="xs:string" /> </xs:choice> </xs:complexType> </xs:element> </xs:schema>

The preceding schema lets the <bold> and <italics> sub-elements occur zero or more times, in any order. The solution is not intuitive because the <xs:choice> element implies that the validator will choose between alternatives, but in this case, it simply chooses all the child <bold> and <italics> elements, letting the document validate. There have been several calls for alterations to the <xs:all> element in XML Schema 1.1, adding support for the maxOccurs="unbounded" attribute value, but that version isn't available yet.

Schema such as this are critical when you need to be able to validate unstructured data, where it's impossible to know the order or number of elements in advance.

Go the the xml.general group now to participate or ask your own question.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date