Analyze Schemas with the XML Schema Infoset Model

s the use of schemas grows, so does the need for tools to manipulate those schemas. IBM’s new XML Schema Infoset Model provides a complete modeling of schemas themselves, including the concrete representations as well as the abstract relationships within a schema or set of schemas. This library easily queries the model of a schema for detailed information. You can also use it to update the schema to fix any problems found and write the schema back out.

Although there are a number of parsers and tools that use schemas to validate or analyze XML documents, tools that allow querying and advanced manipulation of schema documents themselves are still being built. The XML Schema Infoset Model (AKA the Java packages org.eclipse.xsd.*, or just “the library”) provides a rich API library that models schemas?both their concrete representations (perhaps in a schema.xsd file) and the abstract concepts in a schema as defined by the specification. As anyone who has read the schema specs knows, they are quite detailed. The XML Schema Infoset Model strives to expose all the Infoset details within any schema. This allows you to efficiently manage your schema collection, and empower higher-level schema tools such as schema-aware parsers and transformers.

For a quick overview of the library showing all the schema objects modeled, please see the XML Schema Infoset Model UML diagrams. The XML Schema Infoset Model also includes the UML diagrams used in building the library interfaces themselves; these diagrams show the relationships between the library objects, which very closely mimic the concepts in the schema specifications.

The example in this article uses two files in the source code. The FindTypesMissingFacets.xsd file is a simplistic XML Schema that shows basic schema constructs. The FindTypesMissingFacets.java code shows how to use the power of the library to query and manipulate schemas. The example is intended to showcase the power of the XML Schema Infoset Model as one of the first libraries that allows simple and powerful schema manipulation.

Analyzing Your Schemas
The first thing you’ll want to do is check your schema for possibly failing to specify restrictions on integer-derived types. This ensures that all order quantities in purchase orders have been bounded. Here, the schemas must be very specific, so you want to require that all simple types deriving from integers include both min/maxInclusive or min/maxExclusive facets. However, if the min/maxInclusive or min/maxExclusive facets are inherited from and your simple types derive from the same type, that is sufficient.

While you can use XSLT or XPath to query a schema’s concrete representation in an .xsd file or inside some other .xml content, it is much more difficult to discover the type derivations and interrelationships that schema components actually have. Since the XML Schema Infoset Model library models both the concrete representation and the abstract concept of the schema, it can easily be used to collect details about its components, even when the schema has deep type hierarchies or is defined in multiple schema files.

In this simple schema, some types meet the criteria of having max/min facets, and some do not. The full sample schema called FindTypesMissingFacets.xsd is included in the zip file. The following code excerpt illustrates how to write according to schema specs:

                        

Loading Schemas into the XML Schema Infoset Model
The library can read and write schema objects from a variety of sources In the code below, the org.eclipse.emf.ResourceSet framework easily loads sets of schemas; you can also build and emit schemas directly from or to a DOM object that you manage yourself. The library provides a custom XSDResourceSet implementation that can intelligently and automatically load sets of schemas related by includes, imports, and redefines. The abstract relationship between related schemas is also modeled in the library. Take a look at the code excerpt below to see how to load a schema.

// String variable schemaURL is "FindTypesMissingFacets.xsd" or // the URL to your schema Create a resource set and load the // main schema file into it.ResourceSet resourceSet = new ResourceSetImpl();XSDResourceImpl xsdSchemaResource = (XSDResourceImpl)resourceSet.getResource(    URI.createDeviceURI(schemaURL), true);// getResources() returns an iterator over all the resources, // therefore, the main resource and those that have been included, // imported, or redefined.for (Iterator resources = resourceSet.getResources().iterator();     resources.hasNext(); /* no-op */){    // Return the first schema object found, which is the main schema     //   loaded from the provided schemaURL    Resource resource = (Resource)resources.next();    if (resource instanceof XSDResourceImpl)    {        XSDResourceImpl xsdResource = (XSDResourceImpl)resource;        // This returns an org.eclipse.xsd.XSDSchema object        return xsdResource.getSchema();    }}

Convenient Schema Querying
Now that you have an XSDSchema object, query it to find any types that are missing max/min facets. The code below uses some of the available library methods to quickly find all of its simpleTypeDefinitions that derive from the built-in integer type. Since the library provides a complete model of the abstract meaning of a schema, this turns out to be very straightforward. You can query the XSDSchema for its getTypeDefinitions() listing, and then filter for XSDSimpleTypeDefinitions that actually inherit from the base integer type:

 // A handy convenience method quickly gets all //   typeDefinitions within the schemaList allTypes = schema.getTypeDefinitions();ArrayList allIntegerTypes = new ArrayList();for (Iterator iter = allTypes.iterator();         iter.hasNext(); /* no-op */){    XSDTypeDefinition typedef = (XSDTypeDefinition)iter.next();    // Filter out for only simpleTypes...    if ((typedef instanceof XSDSimpleTypeDefinition)         // ... and filter for built-in integer types        // Use a worker method in the very handy sample         //  program org.eclipse.xsd.util. XSDSchemaQueryTools        && XSDSchemaQueryTools.isTypeDerivedFrom(typedef,                 schema.getSchemaForSchemaNamespace(), "integer"))    {        // The filter found one; save it and continue.        allIntegerTypes.add(typedef);    }}

The Schema Components Model
Every component defined in the W3C schema specifications is modeled in detail in the library. When you have a list of all the XSDSimpleTypeDefinitions that derive from an integer, you can query this list for ones that are missing either their max or min facets, and produce a report. Note that the library can conveniently group the effective max/minExclusive or max/minInclusive facets together for quick searching. It also provides detailed access to each type?including the actual lexical values if needed. Listing 1 shows an example of how to query XSDSimpleType components.

Your Report: Types Missing max/min Facets

As you can see, with just a little bit of code, you can discover some fairly detailed information about the schema. Download the sample code and run it against the provided schema file. The following code will result:

Schema missing max/min facet report on: FindTypesMissingFacets.xsdSchema named component: http://www.research.ibm.com/XML/NS/xsd#integer-minFacet  is missing these required facets:    XSDMaxFacet (either inclusive or exclusive)Schema named component: http://www.research.ibm.com/XML/NS/xsd#integer-noFacets  is missing these required facets:    XSDMaxFacet (either inclusive or exclusive)  XSDMinFacet (either inclusive or exclusive)Schema named component:http://www.research.ibm.com/XML/NS/xsd#positiveInteger-inheritedMinFacet  is missing these required facets:  XSDMaxFacet   (either inclusive or exclusive)

Schemas Made Easier
Although this is a contrived example, it does show how the XML Schema Infoset Model’s detailed representation of a schema makes it easy to find exactly the parts of a schema you need. The library provides setter methods for the properties of schema components, so it is easy to update your sample to automatically fix any found types by adding any missing facets. And since the library models the concrete representation of the schema as well, you can write your updated schema back out to an .xsd file.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

Recent Articles: