Browse DevX
Sign up for e-mail newsletters from DevX


Taking XML Validation to the Next Level: XSD Schema vs. CAM : Page 4

The generic-sounding Content Assembly Mechanism, or CAM, is an exciting step beyond XML Schema, but it's new and not well documented. This is the second in a series of articles representing "CAM: The Missing Manual."




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Common Rules

The purchase order example discussed earlier included both a billing address and a shipping address. Each has precisely the same set of child elements and each of those children have the same validation rules. The default template generation duplicates both the elements and the rules. Good code design, however, dictates that duplication be removed to avoid future maintenance issues. This section discusses removing duplicate rules and the following section discusses removing duplicate structure.

Earlier under Business Rules you changed the billing zip code rule from setNumberMask(######.##) to setStringMask(00000) to properly validate a 5-digit zip code. Now you can add a broader rule to encompass the shipping zip code as well.

First, open the context menu for the //shipTo/zip node in the Structure view and select Add New Rule (analogous to Figure 6). In the Rule XPath set of check boxes, uncheck the "Parent" box that is checked by default. This changes the XPath just above it from //shipTo/zip to //zip, which will match any zip node in the document. Set the action to setStringMask('00000'). For real-world complexity, add a condition that lets this rule apply to five-digit zip codes in preparation for another new rule that will handle nine-digit zip codes. The top frame of Figure 7 shows both rules.

Author's Note: Actually, Figure 7 shows duplicates of both rules: one operating on the //shipTo/zip node and another on the //zip node. Although you won't need duplicate rules when this example is complete, it's worth showing them here to make a point: the rules pertaining to //zip apply to both shipping zip codes as well as billing zip codes.

Figure 7. Varying Rule Scope: The top frame shows rules specific to <zip> elements within <shipTo> elements (that do not apply to <billTo> elements) plus rules that apply to any <zip> elements. Note that the latter also appear when you move focus to the <zip> element under <billTo> shown in the lower frame.

When you have //shipTo/zip selected in the Structure view, you'll see its rules in the ItemRules view. Here you see four: those applying to the specific node and those applying to any zip node. Now change focus from //shipTo/zip in the Structure view to //billTo/zip (bottom frame of Figure 7) and observe that only the two rules that apply to all zip nodes appear in the ItemRules view. The XPath selector you define determines the scope of your rule. When you are satisfied with your understanding here go ahead and delete the duplicate rules specific to the //shipTo/zip node, leaving just the two rules that apply to all zip nodes (see the file PurchaseOrder/purchaseOrder_with_generic_zip_rules.cam in the downloadable code).

One excellent yet subtle feature of the CAM editor deserves mention here: When you select a node in the Structure view you see all rules applicable to that node just as if they were attached specifically to that node. But when you look at the list of all rules in the Rules view (as in Figure 5) you find that each rule appears only once, no matter how many nodes it applies to in the structure.

Now that you have learned how to remove duplicate rules, the next section discusses how to remove duplicate structural pieces or element sub-trees.

Common Elements

The purchase order example includes both a shipping address and a billing address. Each includes typical child elements: name, street, city, state, and zip. More to the point, both include precisely the same set of child elements, meaning these elements are candidates for removing duplication. The editor makes this easy to do in just two steps:

  1. Convert the children of one address node to an include file.
  2. Reference the same include file for the other address node.
Author's Note: The specific steps are illustrated in Figure 8, and described next, but the discussion describes what the editor should do rather than what it does do. These particular actions, however, do not quite operate correctly in the current version of the editor, but after you understand the simple steps, you'll be able to fix the code manually until the editor defect is fixed. Given the responsiveness of the development team, however, it may already be fixed by the time you read this.

Figure 8. Eliminating Structure Duplication: Identify the nodes that have identical subtrees. Convert one node's children into an included subtree using the context menu (1). For all others, replace the children with a reference to the same included subtree (2). The final result (3) looks just like the original except that the icons have changed color.

Here's an explanation of the steps shown in Figure 8:

  1. Select the shipTo node in the Structure view and open the context menu. Under the Include choice, select "Make Element Children an Include." The editor prompts you for a file name because it stores XML fragments in a separate include file. Next, the editor shows (middle frame) an annotation on the shipTo element indicating it is now an include file. More subtly, the color of the child element icons has changed from blue to magenta.
  2. Now select the billTo node, open its context menu, and under the Include choice, select "Replace Children with an Include," referencing the include file that you created in step 1.
  3. When the change is successful you'll see the child element icons change from blue to magenta.

The changes are essentially transparent within the editor; you work with the child nodes, applying rules, etc., just as if they were "real" children rather than references to an included file. If you look at the XML source of the CAM template (via the View menu or using an external editor) you will find that the child elements are gone, replaced by a single <as:include> element. You can find this intermediate template file in the downloadable code as PurchaseOrder/purchaseOrder_with_includes.cam.

The original CAM template included this code:

<shipTo country="US"> <name>%string%</name> <street>%string%</street> <city>%string%</city> <state>%string%</state> <zip>%54321%</zip> </shipTo> <billTo country="US"> <name>%string%</name> <street>%string%</street> <city>%string%</city> <state>%string%</state> <zip>%54321.00%</zip> </billTo>

The new code—assuming an include file named po_address_include.xml—looks like this:

<shipTo country="US"> <as:include ignoreRoot="yes"> po_address_include.xml </as:include> </shipTo> <billTo country="US"> <as:include ignoreRoot="yes"> po_address_include.xml </as:include> </billTo>

The above <as:include> elements use a file path relative to the location of the CAM template—the po_address_include.xml file must be in the same directory as the CAM template itself. Alternatively, you could use an absolute file path.

The actual po_address_include.xml included file now contains this XML:

<shipTo country="US"> <name>%string%</name> <street>%string%</street> <city>%string%</city> <state>%string%</state> <zip>%54321%</zip> </shipTo>

Notice that the root element is <shipTo>, because that is the element from which you generated the include file. But remember that you converted the children of <shipTo> to an include file, not the element itself. Therefore, the name of the root element here is immaterial. Indeed, you have already proved that by replacing the children of the <billTo> element with this same include file. This is further affirmed by the presence of the ignoreRoot attribute in the <as:include> elements shown above. My suggestion, then, is to change the root in this include file to something more meaningful, as shown below:

<address> <name>%string%</name> <street>%string%</street> <city>%string%</city> <state>%string%</state> <zip>%54321%</zip> </address>

You have used the two actions on the Include menu that affect the child elements of a given element. There are another two actions that affect the selected element itself. If, for example, you had chosen "Make Element an Include" instead of "Make Element Children an Include," the code in the CAM template would have been:


In this case, note that the ignoreRoot attribute is absent; its default value is "no." Because this code does use the root element, you cannot rename it or delete its attributes; you would need to use the original version above if you wished to include it in this fashion.

This use of an include file that requires the root element would seem to have little utility for removing duplicate code, because you would now need a separate file for <shipTo> and for <billTo>. It's true that this feature is not useful in this scenario, but it could be quite useful in other scenarios. For example, if you had a more complex structure that needed two <shipTo> elements, you could leverage this capability.

Author's Note: Section 3.2.4, Imports, of the CAM specification, discusses how to use XPath to reference specific portions of an include file rather than the whole thing. However, this mechanism is not present in this implementation of the CAM processor. It would be quite handy, because you could then place all the XML fragments in a single include file. As it stands, each must be in its own file.

A Limitation With Mixed Content

One last important point to note is that CAM excels in structured XML processing but it has little support for mixed content. For example, in XSD you could define this schema:

<xs:element name="letter"> <xs:complexType mixed="true"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="orderid" type="xs:positiveInteger"/> <xs:element name="shipdate" type="xs:date"/> </xs:sequence> </xs:complexType> </xs:element>

And that would validate this XML:

<letter> Dear <name>fred</name>: Your <orderid>232</orderid> has shipped on <shipdate>5/12/09</shipdate>. </letter>

CAM does not support validating this type of content.

Now that you have a better sense of the differences between XSD Schema and CAM, the next part of this article delves more deeply into CAM itself.

Michael Sorens is a freelance software engineer, spreading the seeds of good design wherever possible, including through his open-source web site, teaching (University of Phoenix plus community colleges), and writing (contributed to two books plus various articles). With BS and MS degrees in computer science and engineering from Case Western Reserve University, he has worked at Fortune 500 firms and at startups, using C#, SQL, XML, XSL, Java, Perl, C, Lisp, PostScript, and others. His favorite project: designing and implementing the world's smallest word processor, where the medium was silicon, the printer "head" was a laser, and the Declaration of Independence could literally fit on the head of a pin. You can discuss this or any other article by Michael Sorens here.
Thanks for your registration, follow us on our social networks to keep up-to-date