Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Taking XML Validation to the Next Level: Explore CAM's Expressive Power : Page 6

The generic-sounding Content Assembly Mechanism, or CAM, is an exciting step beyond XML Schema, but it's new and not well documented. This article series represents CAM: The Missing Manual. This last installment is a deep exploration of CAM's ability to express exactly what you need for data-centric documents.


advertisement

Combinations of Compositors

Here's a slightly more realistic example that shows a XML Schema file graphically, a CAM template file, and a sample XML instance to validate against either. Figure 5 shows a schema that includes all three types of compositors. Listing 1 shows the schema from which the figure was generated in Liquid XML Studio (the schema also exists in the file Compositors/compositors.xsd in the downloadable code).

 
Figure 5. Combinations of Compositors: This sample schema shows ordered, unordered, and choice compositors.

Using CAMed to generate a CAM conversion of this schema yields the CAM template (Compositors/ compositors_from_xsd.cam) shown in Listing 2. Note that it generates specific XPath expressions for the setChoice predicates as discussed earlier.

When you convert a schema to a CAM template, the first thing you must do is check correctness. XSD-to-CAM conversion is a reasonable, but not a perfect, process; recall that the zip code rules needed adjusting in the earlier purchase order example. Examine the CAM template manually (either the raw file or in the editor) rather than by simply testing whether it validates a given XML file. Doing the latter will at most give you an illusory sense of complacency. Here's why. If you test the CAM template with the following valid sample XML file (Compositors/compositors.xml in the downloadable code), CAMed will report that it fails:

<?xml version="1.0" encoding="utf-8"?> <myRoot> <first_name>string</first_name> <last_name>string</last_name> <classification> <waterfowl>string</waterfowl> </classification> <waterfowl-category> <dabbling>string</dabbling> <dabbling>string</dabbling> </waterfowl-category> <Guidebook> <Title>string</Title> <Author>string</Author> <ISBN>string</ISBN> </Guidebook> <Guidebook> <Title>string</Title> <Author>string</Author> <ISBN>string</ISBN> </Guidebook> <Guidebook> <Title>string</Title> <Author>string</Author> <ISBN>string</ISBN> </Guidebook> </myRoot>

Specifically, CAMed reports that the <dabbling> element is not repeatable. Yet Figure 5 clearly shows that children of the <waterfowl-category> element have cardinality set to zero or more on the XML Schema side. To fix this you must add a makeRepeatable() predicate that matches the setChoice predicate for children of the <waterfowl-category> element.

That's an error of commission, one that CAMed reports. Another error—an error of omission in this case—is on the other bound of the cardinality range: for <waterfowl-category> to accept zero child nodes it must have a rule with a makeOptional() predicate. Validating the preceding sample does not flag this omission because the sample does not violate it—but you could easily construct another sample that would manifest the error.

Another unreported error from this XML sample is that according to the XML Schema the children of <myRoot> must be in the order listed, but the CAM template does not enforce this. Fix this error by adding a rule with an orderChildren() predicate. The downloadable code includes a CAM template (Compositors/compositors.cam) with these corrections.

Element Content

In XML, the concept of element content is simple: an element may contain other elements, or text, or both (mixed content). However, it's hard work to define a CAM template (or an XML Schema for that matter) that requires a specific kind of content.

CAM outshines XML Schema in this task—with one exception: CAM's support for mixed content is limited. Mixed content is, of course, omnipresent: look at almost any web page. But in XML-processing applications mixed content is much more of a rarity. From the perspective of interoperability, XML data is defined to follow a rigid grammar from a sending application for ease of parsing by a receiving application.

With mixed content downplayed, then, CAM inherently supports elements that are text-only or element-only without requiring any markup whatsoever. That is, to create an element containing elements, simply include the child elements in the structure section of the CAM template. To create an element containing text, do exactly the same thing, include the text element. XML Schema is straightforward once you get used to it, but certainly not something you could call intuitive for someone who has never seen it before! See the first two sections in Table 8 (some of the schema examples in the table come from www.w3schools.com/schema/).

The table also includes special cases of text-only nodes: those with no content or those with optional content. Unlike the general case requiring no special markup, you actually have to emit some markup for these, though quite a bit less with CAM than with XML Schema. For the case of no content, the element must be able to contain nothing (add a rule with a makeNillable predicate) and the element must not contain anything else (add a rule forcing the length of its content to be zero with setLength(0)). For the case of optional content, include the same makeNillable rule and omit the setLength rule.

Author's Note: The CAM specification refers to allowNulls rather than makeNillable. My impression is that the specification will change.

Table 8. Element Content Categories: A sample XML instance is provided for each content type along with how it would be represented in XML Schema and in CAM. The key portions for a given type are highlighted in blue.
Type Model Item Notes
Elements only



 

 

Sample

<person>
<firstname>John</firstname>
<lastname>Smith</lastname>
</person>

<person> element may not contain text, only child elements.
XML Schema

<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname"
type="xs:string"/>
<xs:element name="lastname"
type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>

XML Schema default—any element containing child elements may not contain mixed content unless the <xs:complexType> sets the mixed attribute true (see mixed content below).
CAM

<person>
<firstname>%first name%</firstname>
<lastname>%last name%</lastname>
</person>

CAM default—any element containing child elements may not contain mixed content.
Text and attributes only

 

 

Sample

<shoesize @country="Sweden">9
</shoesize>

<shoesize> may not contain child elements, only text and attributes.
XML Schema

<xs:element name="shoesize">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:integer">
<xs:attribute name="country"
type="xs:string" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>

Requires a <simpleContent> wrapper containing either an extension or a restriction.
CAM

<shoesize @country="Sweden">
%int%</shoesize>

CAM default—any element not containing child elements may contain text.
Text only

 

 

Sample

<city>Phoenix</city>

<city> may contain neither child elements nor attributes, only text
XML Schema

<xs:element name="City"
type="xs:string"/>

Without a requirement for attributes, this is straightforward.
CAM

<city>%city-name%</city>

CAM default—any element not containing child elements may contain text.
Mixed content

 

 

Sample

<letter>
Dear Mr.<name>John Smith</name>.
Your order <orderid>1032</orderid>
will be shipped on <shipdate>
2001-07-13</shipdate>.
</letter>

<letter> may contain child elements or text or both.
XML Schema

<xs:element name="letter">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element name="name"
type="xs:string"/>
<xs:element name="orderid"
type="xs:positiveInteger"/>
<xs:element name="shipdate"
type="xs:date"/>
</xs:sequence>
</xs:complexType>
</xs:element>

Requires mixed attribute set to true.
CAM

datatype(any)

Limited support.
No content

 

 

Sample

<product prodid="1345" />

<product> may not contain any content (text or other elements), only attributes.
XML Schema

<xs:element name="product">
<xs:complexType>
<xs:complexContent>
<xs:restriction
base="xs:integer">
<xs:attribute name="prodid"
type="xs:positiveInteger"/>
<!—- no children here—>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
</xs:element>

Define a type that allows only child elements but does not actually define any.
CAM

<product prodid="%int%"/>


<as:constraint
action="makeNillable(
//product,xsd)
" />
<as:constraint
action="setLength(
//product, 0)
" />

Allow the element to be empty (makeNillable) as well as require it to be so (setLength).
Optional content

 

 

Sample

<Author>
<FirstName>Mark</FirstName>
<MiddleName xsi:nil="true"/>
<LastName>Twain</LastName>
</Author>

<MiddleName> may contain text or, as in this example, may explicitly be defined to have no content.
XML Schema

<xs:element name="Author">
<xs:complexType>
<xs:sequence>
<xs:element name="FirstName"
type="xs:string"/>
<xs:element name="MiddleName"
type="xs:string"
nillable="true"/>
<xs:element name="LastName"
type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>

 
CAM

<Author>
<FirstName>%first name%</FirstName>
<MiddleName/>
<LastName>%last name%</LastName>
</Author>


<as:constraint
action="makeNillable(
//MiddleName,xsd)
"/>

Allow the element to be empty with makeNillable.
Fixed content

 

 

Sample

<Client country="UK">
<name>J Smythe</name>
<street>1 Shropshire Place
</street>
<locale>Waterloo</locale>
</Client>

The <country> attribute must contain a fixed value in all cases.
XML Schema

<xs:element name="Client">
<xsd:complexType>
<xsd:sequence>
<xsd:element
name="name"
type="xsd:string"/>
<xsd:element
name="street"
type="xsd:string"/>
<xsd:element
name="locale"
type="xsd:string"/>
</xsd:sequence>
<xsd:attribute
name="country"
type="xsd:NMTOKEN"
fixed="UK"/>
</xsd:complexType>
</xs:element>

The fixed attribute may be applied to attributes or elements.
CAM

<Client country="UK">
<name>%full name%</name>
<street>%street%</street>
<locale>%locale%</locale>
</Client>

Simply specify the attribute without the surrounding percent signs to change it from a placeholder to a fixed value.

Two other items deserve mentioning on the topic of content. A CDATA section embedded within an XML instance document is transparent to the CAM processor just as it is with an XML Schema processor. Whether you write <waterfowl>wigeon</waterfowl> (where the content is the string "wigeon") or <waterfowl><![CDATA[x < 5]]></waterfowl> (where the content is the string "x < 5"), as long as the rules specify that the node may contain a string, then either element is valid.

A second XML construct common in almost any XML file are comments (e.g. <!—any text here—> ). Comments are simply ignored when an XML file is opened in the CAM editor. The only view of the XML is the active node tree so there is no way to even see comments.

Next Steps

That concludes the whirlwind tour of the CAM technology with practical applications. The goal was to make it comprehensive enough to ensure a good grounding in the toolset. It is not complete, though; other interesting features in the editor that you might want to examine include the:

  • Documentation generator: This lets you emit three different forms of documentation for a template.
  • Test case generator: Using this you can create a collection of sample XML instances that conform to the template based on several user settings (see Export → Export Examples).
  • Hinting mechanisms: These let you create more realistic examples.
  • CAM-to-XSD conversion: This is the reverse process of the XSD-to-CAM examples you've seen in this article.
Other CAM links you may need include the CAM blog, the CAM document directory, and a set of PowerPoint slides entitled XSD and jCAM tutorial.

After you are comfortable with designing CAM templates the next obvious step is to integrate CAM into your applications for validation. You can programmatically perform the same validations you've been doing manually in the CAM editor using either a command-line interface or an API. (The API is currently available only for Java.) You can download the Java libraries and tools from jcam.org.uk. The links bar on the home page includes some brief tutorials to help you get started with both the command-line and API interfaces.



Michael Sorens is a freelance software engineer, spreading the seeds of good design wherever possible, including through his open-source web site, teaching (University of Phoenix plus community colleges), and writing (contributed to two books plus various articles). With BS and MS degrees in computer science and engineering from Case Western Reserve University, he has worked at Fortune 500 firms and at startups, using C#, SQL, XML, XSL, Java, Perl, C, Lisp, PostScript, and others. His favorite project: designing and implementing the world's smallest word processor, where the medium was silicon, the printer "head" was a laser, and the Declaration of Independence could literally fit on the head of a pin. You can discuss this or any other article by Michael Sorens here.
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap