Login | Register   
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Taking XML Validation to the Next Level: Introducing CAM

The generic-sounding Content Assembly Mechanism, or CAM, is actually an exciting step forward from XML Schema, but it's new, and not well documented. This article series represents CAM: The Missing Manual.


advertisement
alidating an XML document entails confirming that the document is both well-formed and conforms to a specific set of rules specified with a Document Type Definition (DTD), an XML Schema, or—as introduced in this article—a CAM template. DTD was the earliest specification. DTDs provided useful but limited capabilities, letting you validate XML document structure but very little in the way of semantics. Next came XML Schema, which offered more flexibility and capability, improved support for structure, and good (but not great) support for semantics. Schematron, RelaxNG, and others have attempted to improve the semantic support, but none have caught on in a big way. Now a new—really new—technology called Content Assembly Mechanism (CAM) is being developed under the aegis of OASIS, a well-respected standards body.

CAM is more than just another schema language, though. It was designed to better meet the needs of business exchange requirements and interoperability. CAM provides a powerful mechanism for validating XML both structurally and semantically, in a concise, easy-to-use, easy-to-maintain format. It provides a context mechanism—a way to dynamically adjust what should be considered a valid XML instance based upon other parts of the XML itself or external parameters.

CAM is an exciting technology with much promise, but it is a nascent technology, which can be both good and bad. Things move fast with CAM development, thus you may notice frequent "at the time of writing" disclaimers in this article. However, the chances are good that the development team will act upon some of the problems discussed here and fix them before you ever have a chance to encounter them!

So, at the time of writing this article, CAM's documentation is sketchy: There is a formal specification, a white paper, a PowerPoint presentation, and a few web pages offering a brief introduction to the editor and to the API. There is no definitive guide or tutorial; this article functions as "CAM: The Missing Manual," expanding upon the CAM documentation, covering both the how and the why of applying the specification and its idiosyncrasies to real-world usage.

Author's Note: While working on examples for this article I had to combat a variety of implementation bugs. But the development team is extremely responsive with fixing issues: Very early on I delivered a list of two dozen bugs and had a new release within 24 hours!

What You Need

  • Basic familiarity with XPath. CAM Uses XPath extensively for defining business rules. See the W3Schools' XPath Tutorial for a great refresher.
  • Basic familiarity with XML Schema. While ostensibly this article is about a successor to XML Schema, it relies extensively on contrasts with XML Schema as the most effective way to communicate new approaches. See the W3Schools' XML Schema Tutorial for a great refresher.

Dictating Valid XML

An XML document is a hierarchical composition of elements, a "generic framework for storing any amount of text or any data whose structure can be represented as a tree". An XML document needs only to be well-formed, meaning it must have but a single root and its elements and attributes must conform to the simple XML syntax rules. However, XML has little utility until you map it into a specific problem domain, such as mathematics, book-writing, or financial transactions. Such mapping removes documents from the abstract realm of XML and places them into a specific XML dialect for your particular problem. Any document in your dialect must, by definition, be valid according to your dialect semantics; otherwise it is rejected as invalid and cannot be processed.

Consider this portion of a customer address:

<address> <address_street>221B Baker Street</address_street> . . . </address>

To validate this XML fragment in XML Schema you would typically have a structure such as:

<xs:element name="address"/> . . . <xs:element name="address_street" type="xs:string"/> . . . </xs:element>

These constraints indicate that an <address_street> element must exist, be contained within an <address> element, and must contain a string. For an address, a simple string value may be appropriate, but for other fields you would generally use something more specific, either a specialized string (a derived, restricted string), a date, an integer, or other defined type.

XML Schema is a grammar-based system, in that you define a grammar for both semantics and structure against which an XML instance must conform. Schematron, on the other hand, is a rule-based system where you specify both semantics and structure using rules (see An Introduction to Schematron). That is, not only do you use a rule that specifies an address_street is a string, but you also use a rule to specify that <address_street> must appear within an <address> element. Both XML Schema and Schematron fundamentally intertwine semantics and structure. In programming terms, the coupling is high, which is not desirable.

Author's Note: See Comparing Schematron and CAM validation and XML Schema Language Comparison for more in-depth information.

In contrast, CAM is a hybrid system that separates structure from semantics (low coupling) and specifies semantics with rules. For example, the address example in CAM might look like this:



<as:Structure> <address> <address_street>%street number and name%</address_street> . . . </address> </as:Structure> <as:Rules> <as:constraint action="datatype(//address_street,string)" /> </as:Rules>

The <as:Structure> section of the CAM template defines the hierarchical structure of the XML document in a fashion that virtually duplicates an example XML instance, substituting placeholders (demarcated with percent signs) for actual data. So the preceding CAM template indicates that an XML instance would replace the %street number and number% placeholder with an actual street address.

Author's Note: The only part of the placeholder that has semantic content is the percent signs themselves. Everything between them is completely ignored by the CAM processor; it is for you and consumers of your XML dialect. The Structure view in Figure 3, for example uses just a generic description (%string%) for many placeholders. You might take a different approach though and be more specific using, for instance, %city-name% for the element, %2-letter state abbreviation% for the element, etc.

 
Figure 1. WYSIWYG Example: Microsoft Word users much prefer to see the rendering of the document in the left pane rather than the right, but both represent the same thing and both may be edited to alter the document.
The <as:Structure> section does embody some semantics—those that define which elements contain which other elements and in what order—however, unlike Schematron, you do not need to laboriously write rules to define the structure. CAM specifies structure in a true WYSIWYG nature while for Schematron you have to write the "code." This is analogous to using Microsoft Word in its natural, WYSIWYG form vs. writing the RTF text to generate a Word document—writing RTF is tedious, difficult, and error prone—see Figure 1.

XML Schema is also not WYSIWYG, although some excellent tools such as XmlSpy or Liquid XML Studio help put a WYSIWYG front-end on it. Consider this XML Schema example defining a cost to be in the range 1-999 with 2 decimal places permitted:

<xs:element name="cost"> <xs:simpleType> <xs:restriction base="xs:decimal"> <xs:fractionDigits value="2" /> <xs:totalDigits value="5" /> <xs:minInclusive value="1" /> <xs:maxInclusive value="999" /> </xs:restriction> </xs:simpleType> </xs:element>

The equivalent CAM syntax shown below separates the rules from the structure, with the rules referring back to the appropriate structure elements. The rules map obviously and intuitively to the English description:

<as:constraint action="setNumberMask(//Part/cost,###.##)" /> <as:constraint action="setNumberRange(//Part/cost,1-999)" />

The <as:Rules> section of the CAM template defines all the semantics other than those implicitly embodied in the <as:Structure> section, including datatypes, restrictions, cardinality, conditions, and more.

Benefits of CAM

Table 1 summarizes the key strengths of CAM compared to XML Schema and DTDs. Each line item in the table is covered in detail later in this article or in Part II.

Table 1. Vital Validation Features: The technology(ies) that have the best support for each feature are highlighted in green. CAM clearly has, by far, the strongest repertoire of the three technologies.
# Item DTD XML Schema CAM Example / Notes
1 Separates structure and business rules no (limited business rules) no yes  
2 Current-node fixed validation no yes yes <quantity> holds an integer between 0 and 100.
3 Current-node conditional validation no limited
Using pattern facets [See XML Schema Spec Part 2, section 4.3.4]
yes <zip> must be either 5 or 10 digits.
4 Cross-node conditional validation no limited
Using identity-constraint definitions [See XML Schema Spec Part 1, section 3.11]
yes <taxable> must be no if <state> is AK, FL, NV, SD, TX, WA, WY, NH, or TN; otherwise it must be yes.
5 Context mechanism no yes yes Interpret validity differently depending on whether condition A or condition B is satisfied.
6 Structure variability no no yes For orders exceeding 25kg, customers must also select a freight handler to transport the goods.
7 Parameterized invocation no no yes Orders from Canada must meet criteria x, y, and z, while orders from New Zealand must meet criteria a, b, and c.
  Datatypes 10 44+ 44+  
8 Namespace aware no yes yes  
9 Define own datatypes no yes
Using derived types
yes
Using constraints
<bookNumber> must be an eight-character string.
10 Written in same syntax as documents no yes
XML
yes
XML
 
11 Code reuse limited yes
Using named types
yes
Using XPath selector for rules and include files for structure
<shipTo> and <billTo> addresses contain all the same children and some validation rules.
12 Tools/editors many many 1 "Any color as long as it's black"
13 Graphical designer many many none With XML Schema, designers mask the complexities of the structure.
14 WYSIWYG with external framework with external framework inherent Statement of business rules and implementation of them are almost identical; truly a textual WYSIWYG. On top of that, editor also provides three different auto-generated documentation modes.
15 Adoption mature mature nascent Mature can be better for stability, support, and overhead; nascent can be better for starting new projects cleanly with new technology.
16 APIs Java, .NET, Ruby, Perl, … Java, .NET, Ruby, Perl, … Java  
17 Open standard yes yes yes  

Author's Note: This article is based on a comparison to XML Schema 1.0; version 1.1 is in the works and it will use some of the same types of XPath expressiveness that CAM already has.



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap