Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Book Excerpt: Applied XML Programming for Microsoft .NET : Page 3

The Microsoft .NET Framework allows developers to quickly build robust, secure ASP.NET Web Forms and XML Web service applications, Windows Forms applications, tools, and types. Find out all about its common language runtime and learn how to leverage its power to build, package, and deploy any kind of application or component. Read Chapter 9, ''ADO.NET XML Data Serialization.''


Writing Schema Information

When you serialize a DataSet object, schema information is important for two reasons. First, it adds structured information about the layout of the constituent tables and their relations and constraints. Second, extra table properties are persisted only within the schema. Note, however, that schema information describes the structure of the XML document being created and is not a transcript of the database metadata.

The schema contains information about the constituent columns of each DataTable object. (Column information includes name, type, any expression, and all the contents of the ExtendedProperties collection.)

The schema is always written as an in-line XSD. As mentioned, there is no way for you to write the schema as XDR, as a document type definition (DTD), or even as an added reference to an external file. The following listing shows the schema source for a DataSet object named NorthwindInfo that consists of two tables: Employees and Territories. The Employees table has three columns—employeeid, lastname, and firstname. The Territories table includes employeeid and territoryid columns. (These elements appear in boldface in this listing.)

<xs:schema id="NorthwindInfo" xmlns="" 
  <xs:element name="NorthwindInfo" msdata:IsDataSet="true">
      <xs:choice maxOccurs="unbounded">
        <xs:element name="Employees">
              <xs:element name="employeeid" type="xs:int" />
              <xs:element name="lastname" type="xs:string" />
              <xs:element name="firstname" type="xs:string" />
        <xs:element name="Territories">
              <xs:element name="employeeid" type="xs:int" />
              <xs:element name="territoryid" type="xs:string" />

The <xs:choice> element describes the body of the root node <NorthwindInfo> as an unbounded sequence of <Employees> and <Territories> nodes. These first-level nodes indicate the tables in the DataSet object. The children of each table denote the schema of the DataTable object. (See Chapter 3 for more information about XML schemas.)

The schema can be slightly more complex if relations exist between two or more pairs of tables. The msdata namespace contains ad hoc attributes that are used to annotate the schema with ADO.NETspecific information, mostly about indexes, table relationships, and constraints.

In-Line Schemas and Validation

Chapter 3 hinted at why the XmlValidatingReader class is paradoxically unable to validate the XML code that WriteXml generates for a DataSet object with an in-line schema, as shown here:


In the final XML layout, schema information is placed at the same level as the table nodes, but includes information about the common root (DataSetName, in the preceding code) as well as the tables (Table1 and Table2). Because the validating parser is a forward-only reader, it can match the schema only for nodes placed after the schema block. The idea is that the parser first reads the schema and then checks the compliance of the remainder of the tree with the just-read information, as shown in Figure 9-2.

Figure 9-2  How the .NET Framework validating reader parses a serialized DataSet object with an in-line schema. (Image unavailable)

Due to the structure of the XML document being generated, what comes after the schema does not match the schema! Figure 9-3 shows that the validating parser we built in Chapter 3 around the XmlValidatingReader class does not recognize (I'd say, by design) a serialized DataSet object when an in-line schema is incorporated.

Figure 9-3  The validating parser built in Chapter 3 does not validate an XML DataSet object with an in-line schema. (Image unavailable)

Is there a way to serialize the DataSet object so that its XML representation remains parsable when an in-line schema is included? The workaround is fairly simple.

Serializing to Valid XML

As you can see in Figure 9-2, the rub lies in the fact that the in-line schema is written in the middle of the document it is called to describe. This fact, in addition to the forward-only nature of the parser, irreversibly alters the parser's perception of what the real document schema is. The solution is simple: move the schema out of the DataSet XML serialization output, and group both nodes under a new common root, as shown here:

(Code unavailable)

Here's a code snippet that shows how to implement this solution:

XmlTextWriter writer = new XmlTextWriter(file);
writer.Formatting = Formatting.Indented;

If you don't use an XML writer, the WriteXmlSchema method would write the XML declaration in the middle of the document, thus making the document wholly unparsable. You can also mark this workaround with your own credentials using a custom namespace, as shown here:

writer.WriteStartElement("de", "Wrapper", "dinoe-xml-07356-1801-1");

Figure 9-4 shows the new document displayed in Microsoft Internet Explorer.

Figure 9-4 The DataSet object's XML output after modification. (Image unavailable)

Figure 9-5 shows that this new XML file (validdataset.xml) is successfully validated by the XmlValidatingReader class. The validating parser raises a warning about the new root node; this feature was covered in Chapter 3.

Figure 9-5 The validating parser raises a warning but accepts the updated XML file. (Image unavailable)

A reasonable concern you might have is about the DataSet object's ability to read back such a modified XML stream. No worries! The ReadXml method is still perfectly able to read and process the modified schema, as shown here:

DataSet ds = new DataSet();
ds.ReadXml("ValidDataset.xml", XmlReadMode.ReadSchema);

Although paradoxical, this behavior (whether it's by design or a bug) does not deserve much hype. At first glance, this behavior seems to limit true cross-platform interoperability, but after a more thoughtful look, you can't help but realize that very few XML parsers today support in-line XML schemas. In other words, what appears to be a clamorous and incapacitating bug is actually a rather innocuous behavior that today has a very limited impact on real applications. Real-world cross-platform data exchange, in fact, must be done using distinct files for schema and data.

Customizing the XML Representation

The schema of the DataSet object's XML representation is not set in stone and can be modified to some extent. In particular, each column in each DataTable object can specify how the internal serializer should render its content. By default, each column is rendered as an element, but this feature can be changed to any of the values in the MappingType enumeration. The DataColumn property that specifies the mapping type is ColumnMapping.

Customizing Column Mapping

Each row in a DataTable object originates an XML subtree whose structure depends on the value assigned to the DataColumn object's ColumnMapping property. Table 9-3 lists the allowable column mappings.

Table 9-3 The MappingType Enumeration

AttributeThe column is mapped to an XML attribute on the row node.
ElementThe column is mapped to an XML node element. The default setting.
HiddenThe column is not included in the XML output unless the DiffGram format is used.
SimpleContentThe column is mapped to simple text. (Only for tables containing exactly one column.)

The column data depends on the row node. If ColumnMapping is set to Element, the column value is rendered as a child node, as shown here:

(Code unavailable)

If ColumnMapping is set to Attribute, the column data becomes an attribute on the row node, as shown here:

(Code unavailable)

By setting ColumnMapping to Hidden, you can filter the column out of the XML representation. Unlike the two preceding settings, which are maintained in the DiffGram format, a column marked with Hidden is still serialized in the DiffGram format, but with a special attribute that indicates that it was originally marked hidden for serialization. The reason is that the DiffGram format is meant to provide a stateful and high-fidelity representation of the DataSet object.

Finally, the SimpleContent attribute renders the column content as the text of the row node, as shown here:


For this reason, this attribute is applicable only to tables that have a single column.

Persisting Extended Properties

Many ADO.NET classes, including DataSet, DataTable, and DataColumn, use the ExtendedProperties property to enable users to add custom information. Think of the ExtendedProperties property as a kind of generic cargo variable similar to the Tag property of many ActiveX controls. You populate it with name/value pairs and manage the contents using the typical and familiar programming interface of collections. For example, you can use the DataTable object's ExtendedProperties collection to store the SQL command that should be used to refresh the table itself.

The set of extended properties is lost at serialization time, unless you choose to add schema information. The WriteXml method adds extended properties to the schema using an ad hoc attribute prefixed with the msprop namespace prefix. Consider the following code:


When the tables are serialized, the Command slot is rendered as follows:

<xs:element name="Employees" msprop:Command="...">
<xs:element name="Territories" msprop:Command="...">

ExtendedProperties holds a collection of objects and can accept values of any type, but you might run into trouble if you store values other than strings there. When the object is serialized, any extended property is serialized as a string. In particular, the string is what the object's ToString method returns. This can pose problems when the DataSet object is deserialized.

Not all types can be successfully and seamlessly rebuilt from a string. For example, consider the Color class. If you call ToString on a Color object (say, Blue), you get something like Color [Blue]. However, no constructor on the Color class can rebuild a valid object from such a string. For this reason, pay careful attention to the nonstring types you store in the ExtendedProperties collection.

Rendering Data Relations

A DataSet object can contain one or more relations gathered under the Relations collection property. A DataRelation object represents a parent/ child relationship set between two DataTable objects. The connection takes place on the value of a matching column and is similar to a primary key/foreign key relationship. In ADO.NET, the relation is entirely implemented in memory and can have any cardinality: one-to-one, one-to-many, and even many-to-one.

More often than not, a relation entails table constraints. In ADO.NET, you have two types of constraints: foreign-key constraints and unique constraints. A foreign-key constraint denotes an action that occurs on the columns involved in the relation when a row is either deleted or updated. A unique constraint denotes a restriction on the parent column whereby duplicate values are not allowed. How are relations rendered in XML?

If no schema information is required, relations are simply ignored. When a schema is not explicitly required, the XML representation of the DataSet object is a plain snapshot of the currently stored data; any ancillary information is ignored. There are two ways to accurately represent a DataRelation relation within an XML schema: you can use the <msdata:Relationship> annotation or specify an <xs:keyref> element. The WriteXml procedure uses the latter solution.

The msdata:Relationship Annotation

The msdata:Relationship annotation is a Microsoft XSD extension that ADO.NET and XML programmers can use to explicitly specify a parent/ child relationship between non-nested tables in a schema. This annotation is ideal for expressing the content of a DataRelation object. In turn, the content of an msdata:Relationship annotation is transformed into a DataRelation object when ReadXml processes the XML file.

Let's consider the following relation:

DataRelation rel = new DataRelation("Emp2Terr", 

The following listing shows how to serialize this relation to XML:

(Code unavailable)

This syntax is simple and effective, but it has one little drawback— it is simply targeted to describe a relation. When you serialize a DataSet object to XML, you might want to obtain a hierarchical representation of the data, if a parent/child relationship is present. For example, which of the following XML documents do you find more expressive? The sequential layout shown here is the default:

<Employees employeeid="1" lastname="Davolio" firstname="Nancy" />
<Territories employeeid="1" territoryid="06897" />
<Territories employeeid="1" territoryid="19713" />

The following layout provides a hierarchical view of the data—all the territories' rows are nested below the logical parent row:

<Employees employeeid="1" lastname="Davolio" firstname="Nancy">
  <Territories employeeid="1" territoryid="06897" />
  <Territories employeeid="1" territoryid="19713" />

As an annotation, msdata:Relationship can't express this schema-specific information. Another piece of information is still needed. For this reason, the WriteXml method uses the <xs:keyref> element to describe the relationship along with nested type definitions to create a hierarchy of nodes.

The XSD keyref Element

In XSD, the keyref element allows you to establish links between elements within a document in much the same way a parent/child relationship does. The WriteXml method uses keyref to express a relation within a DataSet object, as shown here:

<xs:keyref name="Emp2Terr" refer="Constraint1">
  <xs:selector xpath=".//Territories" />
  <xs:field xpath="@employeeid" />

The name attribute is set to the name of the DataRelation object. By design, the refer attribute points to the name of a key or unique element defined in the same schema. For a DataRelation object, refer points to an automatically generated unique element that represents the parent table, as shown in the following code. The child table of a DataRelation object, on the other hand, is represented by the contents of the keyref element.

<xs:unique name="Constraint1">
  <xs:selector xpath=".//Employees" />
  <xs:field xpath="employeeid" />

The keyref element's contents consist of two mandatory subelements—selector and field—both of which contain an XPath expression. The selector subelement specifies the node-set across which the values selected by the expression in field must be unique. Put more simply, selector denotes the parent or the child table, and field indicates the parent or the child column. The final XML representation of our sample DataRelation object is shown here:

<xs:unique name="Constraint1">
  <xs:selector xpath=".//Employees" />
  <xs:field xpath="employeeid" />
<xs:keyref name="Emp2Terr" refer="Constraint1">
  <xs:selector xpath=".//Territories" />
  <xs:field xpath="@employeeid" />

This code is functionally equivalent to the msdata:Relationship annotation, but it is completely expressed using the XSD syntax.

Nested Data and Nested Types

The XSD syntax is also important for expressing relations in XML using nested subtrees. Neither msdata:Relationship nor keyref are adequate to express the relation when nested tables are required. Nested relations are expressed using nested types in the XML schema.

In the following code, the Territories type is defined within the Employees type, thus matching the hierarchical relationship between the corresponding tables:

(Code unavailable)

By using keyref and nested types, you have a single syntax—the XML Schema language—to render in XML the contents of any ADO.NET DataRelation object. The Nested property of the DataRelation object specifies whether the relation must be rendered hierarchically—that is, with child rows nested under the parent—or sequentially—that is, with all rows treated as children of the root node.

When reading an XML stream to build a DataSet object, the ReadXml method treats the <msdata:Relationship> annotation and the <xs:keyref> element as perfectly equivalent pieces of syntax. Both are resolved by creating and adding a DataRelation object with the specified characteristics. When ReadXml meets nested types, in the absence of explicit relationship information, it ensures that the resultant DataSet object has tables that reflect the hierarchy of types and creates a DataRelation object between them. This relation is given an auto-generated name and is set on a pair of automatically created columns.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date