Taking XML Validation to the Next Level: Explore CAM’s Expressive Power

ow that you’ve seen a more detailed comparison of CAM to XML Schema in Part 2, this section builds upon that foundation to explore several of those strengths in more detail.

Part 1 of this article showed a table that you’ll need for this section, so it’s reproduced here for your convenience as Table 1, which summarizes the key strengths of CAM compared to XML Schema and DTDs. The line items in the table are covered in detail in the sections below the table.

Validation Examples

The following examples draw on Table 1 to illustrate the points more fully.

Current-Node Fixed Validation (see Table 1, item 2)

As you have seen, applying business rules to restrict the domain of an element to a very specific range of values is straightforward—and something that XML Schema can do as well. Here are some examples:

         

Current-Node Conditional Validation (see Table 1, item 3)

You’ve also seen how to make acceptable values contingent upon some aspect of the value itself (such as deciding that a value is either a five- or nine-digit zip code) with constraints such as these:

      

Cross-Node Conditional Validation (see Table 1, item 4)

XML Schema’s conditional expressions are available only through regular expressions. CAM extends this with cross-node conditional validation, where you specify an action on one node based on the value of a different node. For example, suppose that items with a part number of 123-678 are customized for customers in the state of Washington. If that part number shows up on a purchase order for a customer from a different state you would want it to be flagged as an error. Recall that CAM conditions and actions select nodes by standard XPath. Thus, you essentially already know how to do this; in prior examples the condition always referred to an XPath of “.” (the current node). But if you change this to some other node, with a constraint such as that shown below, you now have a cross-node validation rule.

   

The term cross-node fits better than cross-element because “node” is the more general XML term. An element is a node and an attribute is a node. The XPath conditional expressions aren’t limited to XML element nodes; you may also (as in the example above) reference attribute nodes.

Structure Variability (see Table 1, item 6)

Cross-node validation is impressive. But discovering that you can dynamically modify what constitutes a valid structure is dazzling. Such modification provides tremendous flexibility—but requires learning a few intricacies. Here are a few examples.

  • Example 1: Assume a purchase order includes an optional color attribute. Further assume that the purchase order includes a tray number where the item is found in the warehouse. Items with color attributes don’t need a tray number—the color is sufficient for picking the item. However, it is harmless if the tray number is provided. This constraint expresses these business rules:
   
  • Example 2: Assume you have a purchase order that includes the total order weight, and that the purchase order also includes an optional freight carrier element. However, if the order weight exceeds 25 kg, then the purchase order must specify a freight carrier to transport the goods. That is, if the condition shown below is satisfied then the FreightHandler node must be present. This constraint expresses these business rules:
   

If you are following along with the working examples, you may find that constraints referring to other nodes (such as the preceding examples) may perform as advertised—or not. The results depend on your specific XML instance. Notice that each of the last three examples used the XPath // designator to start each selector. That selector locates any and all nodes that match the rest of the expression. In the previous section the example’s goal was to restrict part number 123-678 to ship only to customers in Washington State. Here’s the rule, repeated:

   

Previously, I glossed over the precise meaning of that constraint, which is:

  • If any has a @pno of 123-678 then all elements within all elements must be WA.

Typical purchase orders have only one shipping address, so the net result is the same as specifying this:

  • If any has a @pno of 123-678 then the single element within the single element must be WA.

The two previous examples in this section that deal with structure variability are not quite as forgiving. The intents of these two constraints are:

  • Make the of an optional if this has a color.
  • Require a for an if the weight of this exceeds 25 kg.

But as written, what they really indicate is this:

  • Make the of any optional if any has a color.
  • Require a for an if the weight of any exceeds 25 kg.

Because a purchase order typically lists more than one item, these rules may produce incorrect results. To fix this, you must introduce the appropriate XPath axis to select only nodes related to the current node for the condition predicate; in other words, you must rewrite the two constraints as:

      

Note that both actions still use the “every match” notation //. Remember that the constraint is attached to the node specified in the action attribute. You want to consider every within an . Whether to trigger the action for any particular one then depends upon the condition, which needs to refer to its locality.

The Intuitive Nature of CAM Design

This section shows another example of structural variability, but puts it into practice with a complete template (see the files Birds/birdlist.cam and Birds/birdlist.xml in the downloadable code).

Consider a list of birds where, for simplicity, you are cataloguing just three types of avians: raptors, waterfowl, and passerines (songbirds). Each bird has a classification indicating its type. If a given bird is a waterfowl, then there must also be a waterfowl category to identify the bird as (for example) a diving duck or a dabbling duck. Contrariwise, if the bird is not a waterfowl then a waterfowl category must not be present. This shows a couple of valid elements in a :

                        wigeon                       yes                                 northern harrier               

To convert your XML into a CAM template, strip it down to remove duplication, include all choices and optional values, and substitute generic placeholders (surrounded with percent signs) for any values:

                        %string%         %string%         %string%                       %string%         %string%                  

The quickest way to move this into a CAM template is to use your favorite text editor—start with an existing template skeleton and simply paste the above piece of XML into the element. Here’s a basic skeleton:

                      To be Completed       0.1 generator v1.23       2008-12-31T12:04:26                                                                                                

After you have your structure in place, you can add the formal business rules. A list of birds should obviously allow more than one bird:

   

Each bird should be classified into one of the choices shown above for the element. The rule does not need to know what the choices are; those are defined in the structure.

   

A bird should specify one of the choices shown above for the element.

   

Finally, to set up the more complicated condition given above, start by stating that the must universally not exist, so no condition attribute appears in this constraint:

   

Then you add the condition back—but only when the current bird is a waterfowl. The precedence rules dictate that unconditional rules are applied first, then conditional rules, so this does what you need:

   

That completes the business rules. Simply take the five rules stated here and embed them in the element in the skeleton. As you can see, mapping your requirements to the formal business rules is a straightforward, almost automatic, process.

Grouping Rules: The CAM Context (see Table 1, item 5)

Up to now, all rules have been in the default context; you can see this by examining the Rules view on any of the examples. A context is simply a mechanism for grouping multiple rules that need to satisfy a common condition. (The term condition is used here in the same sense as in the examples you’ve seen thus far, e.g. “if the item weight exceeds 25 kg” or “if the color is green”, etc.)

 
Figure 1. Adding Conditional Context: Context lets you apply conditionality to multiple rules, starting with the node upon which the context is based.

This next example contrives an international dialing scenario (chosen because readers are likely to be familiar with the concept) where several data elements are related and must move together. This page on International Dialing Codes provides a handy chart listing country code, international dialing prefix, and national dialing prefix for each country. This group of three variables for each country is precisely what a context can assist with. The main window in Figure 1 shows a CAM template for a customer list that includes a element that specifies a country, followed by any number of elements. Each customer has an address and phone details. The rules shown in the figure are all in the default context; those are the starting point (see the file ContextAndParams/addressAndPhone_base.cam in the downloadable code). In this walk-through, you add a context for each country of interest and then add rules to the appropriate phone elements in that context.

Start by right-clicking the element. Select Add New Context and the Add Context Wizard appears (see Figure 1, frame 2). Modify the settings in the wizard to match the changes shown: change the condition from mere existence to matching the string “US” and change the category from default to context. (When you click on the condition this opens yet another dialog that is not shown). Close out of the Add Context Wizard and select the element. Open its context menu and select Add New Rule. The Add New Constraint Wizard opens (see Figure 1, frame 3). Modify the wizard settings to match the changes shown, as follows:

  • Change the action to restrictValues and add the value for the United States, +1.
  • Change the conditional from “No” to “Yes” to expose more fields for setting the condition.
  • Select the context that you just created, and leave the condition at “none.” (You can compound the condition established by the context if you need extra complexity.)

When you close the Add New Constraint Wizard, your rule should appear attached to the element (see the file ContextAndParams/addressAndPhone_base_incremental.cam in the downloadable code). Repeat the same step (see Figure 1, frame 3) to add rules to the and elements. Now you have a context for “US” phone numbers with rules attached to the three relevant elements for that context. Repeat the whole process for the “UK”: return to the element, add a context for the UK, and then on the three phone elements add rules with UK-specific values.

 
Figure 2. Multiple Rules in Multiple Contexts: This figure continues the exercise from Figure 1, creating a separate context for the US and the UK, and applying a rule to each of the three country-specific patterned elements for each context. The ItemRules view shows the two rules for the selected node, one in each context, while the Rules view at the bottom shows all the rules in both contexts.

 
Figure 4. Attaching a Rule with setChoice: (1) Initially no rules are present on the element. Select Add New Rule from the context menu and change the Rule XPath settings to those shown (2), to update the XPath to //classification/*. When you close the wizard there’s no rule attached to the node (3) because the rule XPath specified the node’s children. Frames (4) and (5) confirm that the rule is attached to the child nodes.

Before defining a rule (see Figure 4, frame 1) note that the element is selected in the Structure and that no rules are defined according to the ItemRules view. Opening the context menu on the node and selecting Add New Rule brings up the rule wizard (frame 2). Near the top, the Item field should confirm that you are on the “classification” item. Just below that the XPath (an automatically generated field) shows what the group of “Rule XPath” checkboxes designate. By default, just Parent and All are checked so the XPath should initially show //bird/classification. Deselect Parent and select Children; the XPath will change as shown in the figure, to //classification/*. Finally, select setChoice as the action, and then close the wizard.

Frame 3 in Figure 4 shows the Structure and ItemRules views immediately after closing the wizard. The node is still selected, but surprisingly, there are still no rules defined for it—even though you just created one. So what happened to the rule? It went to the children—all of them—because that’s what the XPath selection defined. Frames 4 and 5 are present simply to prove the point: the rule does indeed appear for each of those nodes, even though there’s only one rule. You can prove this by deleting the rule from the ItemRules view for any of the child nodes; doing so removes it from all the child nodes.

The rule for this set of nodes uses the XPath //classification/* selector to indicate all children of classification nodes. You can express the same content but be more explicit about the child nodes using a more specific XPath expression such as:

   //classification/*[        (name() = 'raptor'   ) or        (name() = 'waterfowl') or        (name() = 'passerine')   ]   

The above expression selects only those children with matching names. The preceding example works only when your XML does not use namespaces; it will fail when the XML includes namespace qualifiers on nodes (e.g. foo:raptor). The following variation attempts to match nodes that contain the base node names when namespaces are in use:

   //classification/*[       contains(name(),'raptor') or       contains(name(),'waterfowl') or       contains(name(),'passerine')   ]

It would seem that this is more robust but the contains() function opens the door for other elements as well, e.g. “velociraptor” matches as readily as “raptor”. Clearly, ends-with() would fare no better. When using namespaces, either match the entire, qualified names, or use an XPath expression including the substring-after function to strip off the namespace in the comparison:

   //classification/*[        (substring-after(name(),':') = 'raptor'   ) or        (substring-after(name(),':') = 'waterfowl') or        (substring-after(name(),':') = 'passerine')   ]

The straightforward //classification/* notation is often adequate but there are two reasons to be aware of variations. First, CAM templates generated from schema files use either the contains or equality variations (this seems to be in a state of flux at the time of writing). This discussion should help you understand the generated templates. Second, and more importantly, the simple notation works only when you have proper hierarchy in your structure. The above example structure looks in part like this:

                                                                                         

Both the and have separate setChoice predicates on all their children. So the classification list of raptor, waterfowl, and passerine corresponds to all (*). But consider a flatter structure that does not isolate the children hierarchically:

                                         

In the preceding example, the XPath expression //bird/* would match diving and dabbling in addition to raptor, waterfowl, and passerine. In this case, you would have to use one of the more specific XPath expressions (you can see examples in the downloadable code in the files Compositors/compositors_flat.cam, .xml, and .xsd). Interestingly, note that this is another way to apply conditions—but without explicitly setting a rule to be conditional. The XPath expression that selects the nodes implements the conditionality.

With this grounding in setChoice, the next section shows the intricacies of layering it with cardinality in practical examples.

Cardinality and the setChoice Predicate

Earlier you learned about the predicates available for specifying cardinality (see Table 3). This section reveals the next level of complexity by layering setChoice on top of cardinality. The downloadable code includes a sample CAM template (Compositors/compositors_setChoice_sandbox.cam) and XML file (Compositors/compositors_setChoice_sandbox.xml) that you can modify to duplicate the examples in this section.

Example 1

Rule Set Cardinality
setChoice(//waterfowl-category/*) Exactly 1

This rule set permits exactly one of any of the child nodes of waterfowl-category because when there are no cardinality rules the cardinality defaults to 1. The particular child node does not matter since the rule specifies the * XPath selector. In the following examples, only the middle one—containing exactly one child—validates successfully.

XML Sample Child Elements Validates?
stringstring
2 Fail
string
1 Pass
0 Fail

Example 2

Rule Set Cardinality
setChoice(//waterfowl-category/*)
makeOptional(//waterfowl-category/*)
0 or 1

This rule set adds an explicit cardinality rule to the setChoice rule. By referencing the same set of nodes that states that you want 0 or 1 child nodes to appear. That means a total of 0 or 1 nodes, not 0 or 1 of each choice. The first sample shown below fails to validate because the total child node count is 2. As long as there is only 1 of either possible choice, or none, the tree will validate.

XML Sample Child Elements Validates?
stringstring
2 Fail
string
1 Pass
string
1 Pass
0 Pass

Example 3

Rule Set Cardinality
setChoice(//waterfowl-category/*)
makeRepeatable(//waterfowl-category/*)
1 or more

This cardinality rule permits any non-zero combination of child nodes. The setChoice rule allows them in any combination. Thus the XML may contain all dabbling nodes, just one diving node, or some of both, as shown below. Only the final row containing none violates the cardinality constraints.

XML Sample Child Elements Validates?
stringstringstring
3 Pass
stringstring
2 Pass
stringstring
2 Pass
string
1 Pass
0 Fail

Example 4

Rule Set Cardinality
setChoice(//waterfowl-category/*)
makeRepeatable(//waterfowl-category/*)
makeOptional(//waterfowl-category/*)
0 or more

By combining makeOptional (0 or 1) with makeRepeatable (1 or more), the result of 0 or more allows any combination of the set of choices, including none.

XML Sample Child Elements Validates?
stringstringstring
3 Pass
stringstring
2 Pass
string
1 Pass
0 Pass

Example 5

Rule Set Cardinality
setRequired(//waterfowl-category/*,2)
setLimit(//waterfowl-category/*,5)
setChoice(//waterfowl-category/*)
Between2 and 5

This rule set shows the remaining available cardinality predicates from Table 3: setRequired specifies a lower bound while setLimit specifies an upper bound. Any occurrences outside those bounds will fail validation, as shown.

XML Sample Child Elements Validates?


string
string
string
string
string
string

6 Fail


string
string
string
string

4 Pass


string
string

2 Pass


string

1 Fail

Combinations of Compositors

Here’s a slightly more realistic example that shows a XML Schema file graphically, a CAM template file, and a sample XML instance to validate against either. Figure 5 shows a schema that includes all three types of compositors. Listing 1 shows the schema from which the figure was generated in Liquid XML Studio (the schema also exists in the file Compositors/compositors.xsd in the downloadable code).

   
 
Figure 5. Combinations of Compositors: This sample schema shows ordered, unordered, and choice compositors.

Using CAMed to generate a CAM conversion of this schema yields the CAM template (Compositors/ compositors_from_xsd.cam) shown in Listing 2. Note that it generates specific XPath expressions for the setChoice predicates as discussed earlier.

When you convert a schema to a CAM template, the first thing you must do is check correctness. XSD-to-CAM conversion is a reasonable, but not a perfect, process; recall that the zip code rules needed adjusting in the earlier purchase order example. Examine the CAM template manually (either the raw file or in the editor) rather than by simply testing whether it validates a given XML file. Doing the latter will at most give you an illusory sense of complacency. Here’s why. If you test the CAM template with the following valid sample XML file (Compositors/compositors.xml in the downloadable code), CAMed will report that it fails:

           string     string            string                 string       string                 string       string       string                 string       string       string                 string       string       string        

Specifically, CAMed reports that the element is not repeatable. Yet Figure 5 clearly shows that children of the element have cardinality set to zero or more on the XML Schema side. To fix this you must add a makeRepeatable() predicate that matches the setChoice predicate for children of the element.

That’s an error of commission, one that CAMed reports. Another error—an error of omission in this case—is on the other bound of the cardinality range: for to accept zero child nodes it must have a rule with a makeOptional() predicate. Validating the preceding sample does not flag this omission because the sample does not violate it—but you could easily construct another sample that would manifest the error.

Another unreported error from this XML sample is that according to the XML Schema the children of must be in the order listed, but the CAM template does not enforce this. Fix this error by adding a rule with an orderChildren() predicate. The downloadable code includes a CAM template (Compositors/compositors.cam) with these corrections.

Element Content

In XML, the concept of element content is simple: an element may contain other elements, or text, or both (mixed content). However, it’s hard work to define a CAM template (or an XML Schema for that matter) that requires a specific kind of content.

CAM outshines XML Schema in this task—with one exception: CAM’s support for mixed content is limited. Mixed content is, of course, omnipresent: look at almost any web page. But in XML-processing applications mixed content is much more of a rarity. From the perspective of interoperability, XML data is defined to follow a rigid grammar from a sending application for ease of parsing by a receiving application.

With mixed content downplayed, then, CAM inherently supports elements that are text-only or element-only without requiring any markup whatsoever. That is, to create an element containing elements, simply include the child elements in the structure section of the CAM template. To create an element containing text, do exactly the same thing, include the text element. XML Schema is straightforward once you get used to it, but certainly not something you could call intuitive for someone who has never seen it before! See the first two sections in Table 8 (some of the schema examples in the table come from www.w3schools.com/schema/).

The table also includes special cases of text-only nodes: those with no content or those with optional content. Unlike the general case requiring no special markup, you actually have to emit some markup for these, though quite a bit less with CAM than with XML Schema. For the case of no content, the element must be able to contain nothing (add a rule with a makeNillable predicate) and the element must not contain anything else (add a rule forcing the length of its content to be zero with setLength(0)). For the case of optional content, include the same makeNillable rule and omit the setLength rule.

Author’s Note: The CAM specification refers to allowNulls rather than makeNillable. My impression is that the specification will change.
Table 8. Element Content Categories: A sample XML instance is provided for each content type along with how it would be represented in XML Schema and in CAM. The key portions for a given type are highlighted in blue.
Type Model Item Notes
Elements only

 

 

Sample

John
Smith
element may not contain text, only child elements.
XML Schema



type="xs:string"/>
type="xs:string"/>


XML Schema default—any element containing child elements may not contain mixed content unless the sets the mixed attribute true (see mixed content below).
CAM

%first name%
%last name%
CAM default—any element containing child elements may not contain mixed content.
Text and attributes only

 

 

Sample
9
may not contain child elements, only text and attributes.
XML Schema


<xs:simpleContent>

type="xs:string" />



Requires a wrapper containing either an extension or a restriction.
CAM

%int%
CAM default—any element not containing child elements may contain text.
Text only

 

 

Sample
Phoenix
may contain neither child elements nor attributes, only text
XML Schema
  type="xs:string"/>
Without a requirement for attributes, this is straightforward.
CAM
%city-name%
CAM default—any element not containing child elements may contain text.
Mixed content

 

 

Sample

Dear Mr.John Smith.
Your order 1032
will be shipped on
2001-07-13
.
may contain child elements or text or both.
XML Schema

mixed="true"
>

type="xs:string"/>
type="xs:positiveInteger"/>
type="xs:date"/>


Requires mixed attribute set to true.
CAM
datatype(any) 
Limited support.
No content

 

 

Sample
may not contain any content (text or other elements), only attributes.
XML Schema



base="xs:integer">
type="xs:positiveInteger"/>




Define a type that allows only child elements but does not actually define any.
CAM

  action="makeNillable(
//product,xsd)
" />
action="setLength(
//product, 0)
" />
Allow the element to be empty (makeNillable) as well as require it to be so (setLength).
Optional content

 

 

Sample

Mark

Twain
may contain text or, as in this example, may explicitly be defined to have no content.
XML Schema



type="xs:string"/>
type="xs:string"
nillable="true"/>
type="xs:string"/>


 
CAM

%first name%

%last name%

  action="makeNillable(
//MiddleName,xsd)
"/>
Allow the element to be empty with makeNillable.
Fixed content

 

 

Sample

J Smythe
1 Shropshire Place

Waterloo
The attribute must contain a fixed value in all cases.
XML Schema



name="name"
type="xsd:string"/>
name="street"
type="xsd:string"/>
name="locale"
type="xsd:string"/>

name="country"
type="xsd:NMTOKEN"
fixed="UK"/>

The fixed attribute may be applied to attributes or elements.
CAM
UK">
%full name%
%street%
%locale%
Simply specify the attribute without the surrounding percent signs to change it from a placeholder to a fixed value.

Two other items deserve mentioning on the topic of content. A CDATA section embedded within an XML instance document is transparent to the CAM processor just as it is with an XML Schema processor. Whether you write wigeon (where the content is the string “wigeon”) or (where the content is the string “x < 5"), as long as the rules specify that the node may contain a string, then either element is valid. A second XML construct common in almost any XML file are comments (e.g. ). Comments are simply ignored when an XML file is opened in the CAM editor. The only view of the XML is the active node tree so there is no way to even see comments.

Next Steps

That concludes the whirlwind tour of the CAM technology with practical applications. The goal was to make it comprehensive enough to ensure a good grounding in the toolset. It is not complete, though; other interesting features in the editor that you might want to examine include the:

  • Documentation generator: This lets you emit three different forms of documentation for a template.
  • Test case generator: Using this you can create a collection of sample XML instances that conform to the template based on several user settings (see Export ? Export Examples).
  • Hinting mechanisms: These let you create more realistic examples.
  • CAM-to-XSD conversion: This is the reverse process of the XSD-to-CAM examples you’ve seen in this article.

Other CAM links you may need include the CAM blog, the CAM document directory, and a set of PowerPoint slides entitled XSD and jCAM tutorial.

After you are comfortable with designing CAM templates the next obvious step is to integrate CAM into your applications for validation. You can programmatically perform the same validations you’ve been doing manually in the CAM editor using either a command-line interface or an API. (The API is currently available only for Java.) You can download the Java libraries and tools from jcam.org.uk. The links bar on the home page includes some brief tutorials to help you get started with both the command-line and API interfaces.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Related Posts