RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Taking XML Validation to the Next Level: XSD Schema vs. CAM : Page 2

The generic-sounding Content Assembly Mechanism, or CAM, is an exciting step beyond XML Schema, but it's new and not well documented. This is the second in a series of articles representing "CAM: The Missing Manual."


Mapping XSD to CAM

You've already seen how an XSD decimal datatype maps to CAM, but to migrate existing schemas you should understand how the other XSD datatypes get mapped. Figure 4 shows a contrived schema that simply contains representative samples of many common XSD datatypes. These are grouped into logical chunks, separating numbers, strings, and date/time elements. You can see a full list and relationships of all datatypes in the built-in datatypes section of the official W3C Schema specification.

Figure 4. The DataTypeSamples Schema: This schema sandbox represents many of the XML Schema datatypes.

In the CAM editor, create a new CAM template from the DataTypeSamples schema (you can find these in the downloadable code in the DataTypeSamples folder, DataTypeSamples.xsd and DataTypeSamples.cam). To recap, the Structure view shows the placeholders generated for the datatypes while the Rules view shows the validation semantics (see Figure 5).

Figure 5. The DataTypeSamples CAM Template: Converting the assorted datatypes from XML Schema to CAM yields this display in the CAM editor.

Table 1 reorganizes the information from the Structure and Rules views to be more digestible. The Category and Data Type columns list each datatype from the original schema (see Figure 4) along with its category. The Placeholder column shows how this maps to the structure portion of the CAM template, and the final two columns show the business context. As an example in interpreting this table, consider the decimalElement (row 3) that uses the decimal datatype you saw in the earlier zip code examples. The placeholder for a decimal datatype is %54321.00%. The business rule (######.##) indicates that it must contain only digits with an optional leading minus sign for the integer portion of the number, followed by a decimal point, followed by precisely two digits in the fractional portion of the number. While zero and the octothorp (#) are seemingly the most straightforward mask elements, there are some subtleties involved in knowing how they restrict your values, as you'll see next.

Table 1. Datatype Mapping: CAM generates the placeholder, condition, and business rule shown for each XML Schema data type listed. Depending on the data type, CAM generates one rule or more than one rule, and may apply a rule with a mask or a rule with a datatype, as discussed in the text.
# Category Data Type Placeholder Condition Business Rule
1   booleanElement %false%   restrictValues( 'true'|'false')
2 number byteElement %type = byte%   datatype( byte)
3 number decimalElement %54321.00%   setNumberMask( ######.##)
4 number doubleElement %type = double%   datatype(double)
5 number floatElement %54321.00%   setNumberMask( ######.####)
6 number intElement %12345%   setNumberMask( ######)
7 number longElement %type = long%   datatype(long)
8 number negativeIntegerElement %type =
  datatype( negativeInteger)
9 number shortElement %type = short%   datatype( short)
10 number unsignedIntElement %type = unsignedInt%   datatype( unsignedInt)
11 string stringElement %string%    
12 string tokenElement %Token%   datatype( token)
13 string normalizedStringElement %string%   datatype( normalizedString)
14 datetime dateElement %YYYY-MM-DDZ% string-length(.) < 11 setDateMask( YYYY-MM-DD)
15 datetime dateElement %YYYY-MM-DDZ% string-length(.) > 10 setDateMask( YYYY-MM-DDZ)
16 datetime dateTimeElement %YYYY-MM-DD
string-length(.) < 26 setDateMask( YYYY-MM-DD'T'HH:MI:SSZ)
17 datetime dateTimeElement %YYYY-MM-DD
string-length(.) > 25 setDateMask( YYYY-MM-DD'T'HH:MI:SS.SZ)
18 datetime timeElement %HH:MI:SS.SZ% string-length(.) < 13 setDateMask(HH:MI:SS.SSS)
19 datetime timeElement %HH:MI:SS.SZ% string-length(.) > 12 setDateMask(HH:MI:SS.SSSZ)
20 datetime durationElement %P1%   restrictValues( 'P1'|'Y2'|'M3'|'DT1'|'H1'|'0M'|'0S')

For now, there are several important points to glean from Table 1:

  • Generated rules are illustrative, not normative. That is, there are usually several different rules you may define to achieve approximately the same thing. The CAM processor in this implementation has made certain choices—but you should consider them guidelines rather than gospel. For example, some datatypes have associated rules with masks (e.g., floatElement) while others have associated rules with the datatype predicate (e.g., doubleElement). (The datatype predicate literally indicates that the value must match that datatype as Java understands it, because CAMed is a Java application.) This CAM processor chose to create its rules in that fashion, but you could do the reverse if you wish: use something like setNumberMask(######.############) for doubleElement or datatype(float) for floatElement.
Author's Note: My recommendation would be to use datatype() for both floats and doubles unless you want to restrict the fractional portion of the number to a specific number of digits (the next section discusses this point further).

  • A single rule may specify multiple possible element values. Use the restrictValues predicate, as shown for the booleanElement (row 1), or the durationElement (row 20), to specify one or more values. The booleanElement, for example, must contain either true or false to pass validation. Note though, that all possible values are constants and must be specified in advance.
  • A single element may have multiple associated rules. This allows you to specify multiple formats as opposed to just multiple values. The dateElement, dateTimeElement, and timeElements all show examples of these. Multiple rules may conflict with each other as long as their conditions are mutually exclusive. For the dateElement, the first rule (row 14) does not allow a final Z while the second rule (row 15) requires it. But that works fine because of the condition attached to each. Note that any date in the format YYYY-MM-DD has exactly 10 characters; adding the final Z makes it 11 characters. The conditions derive directly from these observations: when the dateElement has fewer than 11 characters, the date must be in the YYYY-MM-DD format; when it exceeds 10 characters, it must be in the YYYY-MM-DDZ format.
  • Reprise: Generated rules are illustrative, not normative. This important point bears repeating with respect to the timeElement. This element has only two rules associated with it but you should consider that as just a starting point. For example, here's an expanded list of possible time formats that would allow your template to accept a more flexible range of time values:
       (a) HH:MI:SS.SSSZ
       (b) HH:MI:SS.SSS
       (c) HH:MI:SS.SSZ
       (d) HH:MI:SS.SS
       (e) HH:MI:SS.SZ
       (f) HH:MI:SS.S
       (g) HH:MI:SSZ

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date