Chapter 9 ADO.NET XML Data Serialization
XML is the key element responsible for the greatly improved interoperability of the Microsoft ADO.NET object model when compared to Microsoft ActiveX Data Objects (ADO). In ADO, XML was merely an I/ O format (nondefault) used to persist the contents of a disconnected recordset. The participation of XML in the building and in the interworkings of ADO.NET is much deeper. The aspects of ADO.NET in which the interaction and integration with XML is stronger can be summarized in two categories: object serialization and remoting and a dual programming interface.
In ADO.NET, you have several options for saving objects to, and restoring objects from, XML documents. In effect, this capability belongs to one object onlythe DataSet objectbut it can be extended to other container objects with minimal coding. Saving objects like DataTable and DataView to XML is essentially a special case of the DataSet object serialization.
As we saw in Chapter 8, ADO.NET and XML classes provide for a unified, intermediate API that is made available to programmers through a dual, synchronized programming interfacethe XmlDataDocument class. You can access and update data using either the hierarchical node-based approach of XML or the relational approach of column-based tabular data sets. At any time, you can switch from a DataSet representation of the data to an XML Document Object Model (XML DOM) representation, and vice versa. Data is synchronized, and any change you enter in either model is immediately reflected and visible in the other.
In this chapter, we'll explore the XML features built around the DataSet object and other ADO.NET objects for data serialization and deserialization. You'll learn how to persist and restore data contents, how to deal with schema information, and even how schema information is automatically inferred from the XML source.
Serializing DataSet Objects
Like any other .NET Framework object, a DataSet
object is stored in memory in a binary format. Unlike other objects, however, the DataSet
object is always remoted and serialized in a special XML format, called a DiffGram. (We'll look at the DiffGram format and the relative API in more detail in Chapter 10
.) When the DataSet
object trespasses across the boundaries of the application domains (AppDomains), or the physical borders of the machine, it is automatically rendered as a DiffGram. At its destination, the DataSet
object is silently rebuilt as a binary and immediately usable object.
In ADO.NET, serialization of an object is performed either through the public ISerializable interface or through public methods that expose the object's internal serialization mechanism. As .NET Framework objects, ADO.NET objects can plug into the standard .NET Framework serialization mechanism and output their contents to standard and user-defined formatters. The .NET Framework provides a couple of built-in formatters: the binary formatter and the Simple Object Access Protocol (SOAP) formatter. A .NET Framework object makes itself serializable by implementing the methods of the ISerializable interfacespecifically, the GetObjectData method, plus a particular flavor of the constructor. According to this definition, both the DataSet and the DataTable objects are serializable.
In addition to the official serialization interface, the DataSet object supplies an alternative, and more direct, series of methods to serialize and deserialize itself, but in a class-defined XML format only. To serialize using the standard method, you create instances of the formatter object of choice (binary, SOAP, or whatever) and let the formatter access the source data through the methods of the ISerializable interface. The formatter obtains raw data that it then packs into the expected output stream.
In the alternative serialization model, the DataSet object itself starts and controls the serialization and deserialization process through a group of extra methods. The DataTable object does not offer public methods to support such an alternative and embedded serialization interface, nor does the DataView object.
In the end, both the official and the embedded serialization engines share the same set of methods. The overall architecture of DataSet and DataTable serialization is graphically rendered in Figure 9-1.
Figure 9-1 Both the DataSet object and the DataTable object implement the ISerializable interface for classic .NET Framework serialization. The DataSet object also publicly exposes the internal API used to support classic serialization. (Image unavailable)
All the methods that the DataSet object uses internally to support the .NET Framework serialization process are publicly exposed to applications through a group of methods, one pair of which clearly stands out ReadXml and WriteXml. The DataTable object, on the other hand, does not publish the same methods, although this feature can be easily obtained with a little code. (I'll demonstrate this in the section "Serializing Filtered Views," on page 417.)
As you can see in the architecture depicted in Figure 9-1, both objects always pass XML data to .NET Framework formatters. This means that there is no .NET Frameworkprovided way to serialize ADO.NET objects in binary formats. We'll return to this topic in the section "Custom Binary Serialization," on page 424.
The DataSet Object's Embedded API for XML
Table 9-1 presents the DataSet
object methods you can use to work with XML, both in reading and in writing. This list represents the DataSet
object's internal XML API, which is at the foundation of the serialization and deserialization processes for the object.
Table 9-1 The DataSet Object's Embedded Serialization API
|GetXml||Returns an XML representation of the data currently stored in the DataSet object. No schema information is included.
|GetXmlSchema||Returns a string that represents the XML schema information for the data currently stored in the object.
|ReadXml||Populates the DataSet object with the specified XML data read from a stream or a file. During the process, schema information is read or inferred from the data.
|ReadXmlSchema||Loads the specified XML schema information into the current DataSet object.
|WriteXml||Writes out the XML data, and optionally the schema, that represents the DataSet object to a storage mediumthat is, a stream or a file.
|WriteXmlSchema||Writes out a string that represents the XML schema information for the DataSet object. Can write to a stream or a file.
Note that GetXml returns a string that contains XML data. As such, it requires more overhead than simply using WriteXml to write XML to a file. You should not use GetXml and GetXmlSchema unless you really need to obtain the DataSet representation or schema as distinct strings for in-memory manipulation. The GetXmlSchema method returns the DataSet object's XML Schema Definition (XSD) schema; there is no way to obtain the DataSet object's XML-Data Reduced (XDR) schema.
As Table 9-1 shows, when you're working with DataSet and XML, you can manage data and schema information as distinct entities. You can take the XML schema out of the object and use it as a string. Alternatively, you could write the schema to a disk file or load it into an empty DataSet object. Alongside the methods listed in Table 9-1, the DataSet object also features two XML-related properties: Namespace and Prefix. Namespace specifies the XML namespace used to scope XML attributes and elements when you read them into a DataSet object. The prefix to alias the namespace is stored in the Prefix property. The namespace can't be set if the DataSet object already contains data.