RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Taking XML Validation to the Next Level: Introducing CAM : Page 4

The generic-sounding Content Assembly Mechanism, or CAM, is actually an exciting step forward from XML Schema, but it's new, and not well documented. This article series represents CAM: The Missing Manual.


Creating Business Rules

In the Structure view select the <zip> element under the <shipTo> element. The rules attached to this element appear in the ItemRules view. In this case, there is only a single rule, using the setNumberMask predicate. Open the context menu for this rule by right-clicking on the rule in the category column, and then selecting Edit Rule. The Edit Constraint Rule dialog box opens (see Figure 7).

Figure 7. Editing a Constraint Rule: To fix the setNumberMask predicate attached to the //shipTo/zip elements, select the element in the Structure view, open its context menu, and select Edit Rule to open the Edit Constraint Rule dialog. Click the Number Mask field for help in specifying the mask.

Click on the number mask field, which opens another dialog to edit the mask. For now, just modify the field from ######.## to #####; that is, replace the original mask with just five octothorps. Close both dialogs. In the main editor window you'll see the updated rule. Re-execute the validation. The //shipTo/zip error should be gone, leaving only an error on //billTo/zip. This is clearly the same error, so you can fix it the same way. But because the //billTo/zip value should always behave identically to the //shipTo zip value, it would be much cleaner to have a common rule for both rather than separate rules. The Common Rules section in Part II of this article discusses how to do this in more detail.

After updating the rule you also need to update the placeholder (item 1 in Figure 7). If you compare that to Figure 6, you can see that the value changed from %54321.00% to %54321%, which is more representative of a zip code. In this particular example, where the element's placeholder and the associated rule are closely related, it is reasonable to suppose that they should automatically track each other in some fashion. But in many cases the relationship is not nearly as straightforward. Elements and rules have a many-to-many relationship: You could have multiple rules applied to a single element or a single rule applied to multiple elements.

To update the element's placeholder as in Figure 7, open the context menu on the //shipTo/zip field in the Structure view and select Edit Text. In the dialog change %54321.00% to %54321%.

The placeholder serves a dual role. The CAM processor uses it solely to determine if an element's content is fixed or not, determined by the presence of the percent signs surrounding the placeholder. (Notice that you re-ran the validation and the //shipTo/zip field validated before updating the element's placeholder, confirming that the value between the percent signs is ignored by the CAM processor.)

The value between the percent signs is for human consumption, and should accurately and concisely convey what the element contains. Often the context has already done most of the work for you: the element name is "zip", which is immediately recognized in the US as a string containing 5, 9, or 10 digits. By setting the placeholder to %54321% you are telling consumers of the template that you want only five-digit zip codes.

Stress-Testing the Validation

Now you have updated the placeholder and the rule. But are these two changes sufficient to properly validate a five-digit zip code? To check this you need to feed different test cases to the CAM processor. The simplest way is to open the XML view containing the data that you are validating, change the //shipTo/zip value, and re-validate. You edit nodes in the XML view just as in the Structure view: open the context menu and select Edit Text. Determine the smallest set of values that yield good coverage of all possible values (that is, determine appropriate equivalence classes of data) and feed each one to the validator. Table 3 provides one such list. There are two result columns because, as you may have surmised, what you have done so far does not properly validate values in the zip field. The two items marked in red in the second column produced an incorrect result. In this case, both passed validation when they should have both failed.

Table 3. Zip Code Test Cases: This table shows results of several equivalence classes of values using the numeric mask ##### compared to using the string mask 00000. Results marked in green are correct; red results are incorrect.
//shipTo/zip setNumberMask(#####) setStringMask(00000)
90952 Pass Pass
90952.1 Fail Fail
123456 Fail Fail
90952-1234 Fail Fail
1 Pass Fail
(blank entry) Fail Fail
90952a Fail Fail
-12345 Pass Fail
(123) Fail Fail

These two tests passed for the same reason: The mask is numeric, and both tests are valid numbers. So you need to back up a step. Even though a zip code contains only numbers, it is really a string masquerading as a number. While numerically, 00001 and 1 are the same, in the domain of zip codes, 00001 represents a valid zip code, while 1 does not. Therefore, instead of setting a numeric mask use a textual mask. Open the Edit Constraint Rule dialog for //shipTo/zip and change the action from setNumberMask to setStringMask. Click on the String Mask field to open the mask editor. Type five zeroes or press the "Digit [0-9] button" five times, then exit both dialogs. If you now re-validate each test case in Table 3, you'll find that they all produce correct results, as shown in column three.

Changing the rule from checking for numbers to checking for strings let the processor fail the negative value, and changing the mask character from "#" (indicating a digit where leading zeroes may be absent) to "0" (indicating a digit where leading zeroes are required) allowed the processor to fail the 1 value. The value would pass if you changed it to 00001. The list of valid mask characters is documented in the formal CAM specification under section 3.4.3: CAM Content Mask Syntax. Table 4 is an adaptation from that section, with the text revised for clarity.

Table 4. Mask Characters: When a rule action requires a mask, these characters have special meaning.
Character Description
String Masks 
X Any character; mandatory
A Mandatory alphanumeric character or space
a Optional alphanumeric character or space
? Any single character
* Zero or more characters
U A character to be converted to upper case
^ Uppercase; optional
L A character to be converted to lower case
_ Lowercase; optional
0 A digit; trailing and leading zeros displayed; leading minus sign permitted
# A digit; trailing and leading zeros suppressed; leading minus sign permitted
' ' Single quotes escape a character block to denote mandatory character/s
Number Masks 
0 A digit; trailing and leading zeros displayed; leading minus sign permitted
# A digit; trailing and leading zeros suppressed; leading minus sign permitted
. Literal decimal point
J As the first character of a mask, invokes alternate Java formatting methods to handle mask processing (the literal J is ignored when passed to Java)
Date Masks 
DD Day number in a month
DDD Day number in a year
DDDD Relative day number(?) in a month
MM Month number in a year
MMM... Month name, e.g. January (field is padded or truncated to the number of M's, 3-10 permitted)
YY Two-digit year
YYYY Four-digit year
W Day number in a week
WWW... Day name (field is padded or truncated to the number of W's, 3-10 permitted)
/ Literal virgule; a date separator
- Literal hyphen; alternate date separator

If you are looking for a tool set with full, clear documentation, and one that has had virtually all the bugs ironed out, you must regrettably look elsewhere. But if you do not mind a few rough edges on a gem of great value, I believe you will find CAM to be a great tool for your arsenal. Finally, given the zeal of the developers, it is quite possible that the behavior of the latest version of the CAM editor and the CAM engine may vary from what I describe here, using version 1.6.2.

This concludes Part I, but you have seen only a glimpse of how intuitive and easy it is to design with CAM. In the next part of this article you'll see much more of CAM's expressive power. Additionally, you'll see much more in-depth discussion of practical techniques for developing templates and rules including: leveraging common structure and common rules; conditionalizing validation based on either internal or external factors; detailed comparison to XSD regarding datatypes, compositors, and cardinality; and finally, some pitfalls to avoid.

Michael Sorens is a freelance software engineer, spreading the seeds of good design wherever possible, including through his open-source web site, teaching (University of Phoenix plus community colleges), and writing (contributed to two books plus various articles). With BS and MS degrees in computer science and engineering from Case Western Reserve University, he has worked at Fortune 500 firms and at startups, using C#, SQL, XML, XSL, Java, Perl, C, Lisp, PostScript, and others. His favorite project: designing and implementing the world's smallest word processor, where the medium was silicon, the printer "head" was a laser, and the Declaration of Independence could literally fit on the head of a pin. You can discuss this or any other article by Michael Sorens here.
Email AuthorEmail Author
Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date