Writing Data as XML
The contents of a DataSet
object can be serialized as XML in two ways that I'll call stateless
. Although these expressions are not common throughout the ADO.NET documentation, I believe that they capture the gist of the two XML schemas that can be used to persist a DataSet
object's contents. A stateless representation takes a snapshot of the current instance of the data and renders it according to a particular XML schema (defined in Chapter 1
as the ADO.NET normal form). A stateful representation, on the other hand, contains the history of the data in the object and includes information about changes as well as pending errors. Keep in mind that stateless
refer to the data in the DataSet
object but not to the DataSet
object as a whole.
In this chapter, we'll focus on the stateless representation of the DataSet object, with just a glimpse at the stateful representationthe DiffGram format. In Chapter 10, we'll delve into the DiffGram's structure and goals.
The XML representation of a DataSet object can be written to a file, a stream, an XmlWriter object, or a string using the WriteXml method. It can include, or not include, XSD schema information. The actual behavior of the WriteXml method can be controlled by passing the optional XmlWriteMode parameter. The values in the XmlWriteMode enumeration determine the output's layout. The overloads of the method are shown in the following listing:
public void WriteXml(Stream, XmlWriteMode);
public void WriteXml(string, XmlWriteMode);
public void WriteXml(TextWriter, XmlWriteMode);
public void WriteXml(XmlWriter, XmlWriteMode);
WriteXml provides four additional overloads with the same structure as this code but with no explicit XmlWriteMode argument.
The stateless representation of the DataSet object takes a snapshot of the current status of the object. In addition to data, the representation includes tables, relations, and constraints definitions. The rows in the tables are written only in their current versions, unless you use the DiffGram formatwhich would make this a stateful representation. The following schema shows the ADO.NET normal formthat is, the XML stateless representation of a DataSet object:
The root tag is named after the DataSet object. If the DataSet object has no name, the string NewDataSet is used. The name of the DataSet object can be set at any time through the DataSetName property or via the constructor upon instantiation. Each table in the DataSet object is represented as a block of rows. Each row is a subtree rooted in a node with the name of the table. You can control the name of a DataTable object via the TableName property. By default, the first unnamed table added to a DataSet object is named Table. A trailing index is appended if a table with that name already exists. The following listing shows the XML data of a DataSet object named NorthwindInfo:
Basically, the XML representation of a DataSet object contains rows of data grouped under a root node. Each row is rendered with a subtree in which child nodes represent columns. The contents of each column are stored as the text of the node. The link between a row and the parent table is established through the name of the row node. In the preceding listing, the <Employees></Employees> subtree represents a row in a DataTable object named Employees.
Modes of Writing
Table 9-2 summarizes the writing options available for use with WriteXml through the XmlWriteMode enumeration.
Table 9-2 The XmlWriteMode Enumeration
|DiffGram||Writes the contents of the DataSet object as a DiffGram, including original and current values.
|IgnoreSchema||Writes the contents of the DataSet object as XML data without a schema.
|WriteSchema||Writes the contents of the DataSet object, including an in-line XSD schema. The schema can't be inserted as XDR, nor can it be added as a reference.
IgnoreSchema is the default option. The following code demonstrates the typical way to serialize a DataSet object to an XML file:
StreamWriter sw = new StreamWriter(fileName);
dataset.WriteXml(sw); // Defaults to IgnoreSchema
In terms of functionality, calling the GetXml
method and then writing its contents to a data store is identical to calling WriteXml
set to IgnoreSchema
. Using GetXml
can be comfortable, but in terms of raw overhead, calling WriteXml
on a StringWriter
object is slightly more efficient, as shown here:
StringWriter sw = new StringWriter();
// Access the string using sw.ToString()
The same considerations apply to GetXmlSchema and WriteXmlSchema.
Preserving Schema and Type Information
The stateless XML format is a flat format. Unless you explicitly add schema information, the XML output is weakly typed. There is no information about tables and columns, and the original content of each column is normalized to a string. If you need a higher level of type and schema fidelity, start by adding an in-line XSD schema.
In general, a few factors can influence the final structure of the XML document that WriteXml creates for you. In addition to the overall XML formatDiffGram or a plain hierarchical representation of the current contentsimportant factors include the presence of schema information, nested relations, and how table columns are mapped to XML elements.
To optimize the resulting XML code, the WriteXml
method drops column fields with null
values. Dropping the null
column fields doesn't affect the usability of the DataSet
objectyou can successfully rebuild the object from XML, and data-bound controls can easily manage null
values. This feature can become a problem, however, if you send the DataSet
object's XML output to a non-.NET platform. Other parsers, unaware that null
values are omitted for brevity, might fail to parse the document. If you want to represent null
values in the XML output, replace the null
type) with other neutral values (for example, blank spaces).