RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Taking XML Validation to the Next Level: XSD Schema vs. CAM : Page 3

The generic-sounding Content Assembly Mechanism, or CAM, is an exciting step beyond XML Schema, but it's new and not well documented. This is the second in a series of articles representing "CAM: The Missing Manual."


Adding Additional Rules

To add more rules, you need to identify a condition for each one. For convenience, here's a copy of the rule list from the previous page:

   (a) HH:MI:SS.SSSZ
   (b) HH:MI:SS.SSS
   (c) HH:MI:SS.SSZ
   (d) HH:MI:SS.SS
   (e) HH:MI:SS.SZ
   (f) HH:MI:SS.S
   (g) HH:MI:SSZ

The two rules generated by default (items a and b) use the string-length function to check for the presence or absence of the final "Z", just as with dateElement. But because some formats in the list are the same length, you can't rely solely on string length; you must introduce more complicated clauses and additional functions. For example, items (b) and (c) are both the same length, so you need to change the single rule for (b) from this:

Figure 6. Adding a Rule to the timeElement: Open the context menu (1) on the timeElement, and select Add New Rule. You'll see the Add New Constraint wizard (2). Set the action to setDateMask, and then click the Date Mask field to open the date mask editor (3). On the main wizard, change "Conditional?" from "No" to "Yes" to expose the Condition field. Click on the Condition field to open the condition editor (4).
   string-length(.) < 13

to this:

   string-length(.) < 13 and not(ends-with(., 'Z'))"

Then you can add this rule for (c):

   string-length(.) < 13 and ends-with(., 'Z')

To define this new rule, access the Add New Constraint wizard by opening the context menu on the timeElement in the Structure view (not the XML view) and selecting Add New Rule. The wizard lets you define both the mask and the Boolean conditions—see Figure 6. In the top part of the wizard you specify the target nodes using XPath and the action to apply to those nodes. In the bottom portion, change "Conditional?" from "No" to "Yes." When you do that, the wizard exposes additional fields. Clicking on the condition field opens another wizard to specify the condition, also an XPath expression.

Author's Note: For convenience, the wizard includes a list of XPath functions in a drop-down selector but, at the time of this writing, the list is incomplete; however, you can simply type the condition manually.

Here's some perspective on placeholders:

  • Placeholders may be misleading. A more positive spin on this statement is that a generated placeholder is just a starting point for describing the content of an element. Like generated rules, placeholders are emphatically not normative. For example, the intElement uses the placeholder 12345, implying a five-digit whole number. But an intElement may contain one, five, or nine digits. And it may be negative. The 12345 is intended to suggest that the value is simply a whole number, positive or negative, with any number of digits. Similarly, the floatElement with a placeholder of 54321.00, seems to indicate a number with five integer digits on the left, and two fractional digits. So if an element has three decimal places should that fail validation as a floatElement? Remember that the rule determines what is acceptable for an element; the placeholder is but a human-readable mnemonic. Therefore, you can think of the 54321.00 placeholder as an element containing a whole number component and a fractional component, with no specific claims about the magnitude of either. Depending on your preferences, you may consider this perspective misleading. A more general approach would be to use a placeholder such as doubleElement uses, "type=double" (or perhaps just "double").
  • Elements may have multiple rules, but only one placeholder. Refer again to the timeElement in Figure 5. It appears only once in the Structure view, but if you click on that node you will see two corresponding rules in the ItemRules view (rows 18 and 19 of Table 1). Curiously, the generated placeholder (HH:MI:SS.SZ) does not match either rule. With just those two rules you might specify a placeholder using regular expression notation, e.g. HH:MI:SS.SS(Z|S), to correspond to the two rules. If, however, you intend to implement the seven rules for timeElement shown earlier (or even more), a regular expression covering all of them would be unwieldy. In that case, you might opt for the more generic approach of just saying "time-value" or something similar.
  • An element's rules must cover its universe of discourse. Referring to the two rules for dateElement (rows 14 and 15 in Table 1), you might wonder why they indicate less than 11 and greater than 10 instead of equal to 10 and equal to 11. What happens to a value that is 12 characters? With the current rules, this value would trigger the date mask in row 15 of the table and the value would fail validation—as it should. If instead you were looking only for exactly 10 or 11 characters, a 12-character value would not trigger either of those rules, so the input document would not fail validation.

Numeric Mask Subtleties

To further examine numeric masks applied with the setNumberMask predicate, take a look at the two provided sample files (DataTypeSamples/NumberMaskSamples.cam and DataTypeSamples/NumberMaskSamples.xml). The template is quite small, consisting of just two elements under the root. Here's the structure:

   <as:Structure taxonomy="XML" ID="NumberMaskSamples" reference="">

And here are the rules:

   <as:constraint action="setChoice(
      'octothorpElement')])" />
   <as:constraint action="setNumberMask(
      //NumberMaskSamples/octothorpElement,###.##)" />
   <as:constraint action="setChoice(
      'zeroElement')])" />
   <as:constraint action="setNumberMask(
      //NumberMaskSamples/zeroElement,000.00)" />

The preceding rules determine that, if an element is an <octothorpElement> apply a mask with octothorps; if it's a <zeroElement> apply a mask with zeros.

The NumberMaskSamples.xml file is nothing more than 17 separate test cases for the <octothorpElement> and the same test cases repeated for the <zeroElement>. Here's a portion of the XML instance:

     . . .
     . . .

These illustrate some interesting points about numeric masks. Table 2 shows the validation results for each test case value using both types of masks. You may run the validation yourself with the files provided. The XML file contains each number in the Value column inserted into both an <octothorpElement> and a <zeroElement>. The main difference between octothorp and zero as mask elements is that the former allows zero suppression while the latter does not. Using octothorps, the values in rows 1-3 are valid, while only row 3 is valid with the zero mask because the integer portion has three digits, matching the mask.

Row 4 is curious because although the integer portion of the mask contains only three octothorps, values with four integer digits are still considered valid. That is, the octothorp mask makes no restriction on the number of digits to the left of the decimal point. For the fractional portion, however, it is a bit less forgiving. There are two octothorps in the mask. Values with fewer than two decimal digits validate (row 16) but values with more than two do not (row 14).

The specification does not mention that negative numbers are permitted with either mask character but both mask characters do allow a leading minus sign. As you've seen the zero mask is quite particular about digit count to the left of the decimal point. But does the minus sign count as a digit? This seems to be a spot where CAM cannot quite make up its mind, because the values in rows 6 and 7 are both valid.

Table 2. Numeric Mask Differences: Running the same set of tests with a mask using zero suppression (###.##) and a mask requiring all digits (000.00) shows the differences in their behavior. The shaded cells indicate apparent inconsistency in the mask behavior, as described in the comments.
# Value Mask ###.## Mask 000.00 Comments
1 1.42 Pass Fail Leading spaces are OK with "#" but not with "0".
2 12.42 Pass Fail "#" allows fewer than the number of places in the mask to the left of the decimal by definition (due to zero suppression).
3 123.42 Pass Pass Digit count matches mask.
4 1234.42 Pass Fail "#" allows more than the number of places in the mask to the left of the decimal.
5 -1.42 Pass Fail Minus sign allowed.
6 -12.42 Pass Pass The minus sign should either be counted as a digit…
7 -123.42 Pass Pass … or not counted as a digit, but not both!
8 -1234.42 Pass Fail  
9 - 2.23 Pass Fail Leading spaces are OK with "#".
10 - 3.23 Pass Fail  
11 - 4.23 Pass Fail  
12 -003.23 Pass Pass Leading zeroes are optional with "#" but required with "0".
13 0.23 Pass Fail  
14 0.234 Fail Fail "#" does not allow more than two digits to the right.
15 .23 Pass Fail  
16 .2 Pass Fail "#" does allow fewer than two digits to the right by definition (due to zero suppression).
17 . Pass Fail  

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date