Browse DevX
Sign up for e-mail newsletters from DevX


Book Excerpt: Applied XML Programming for Microsoft .NET : Page 4

The Microsoft .NET Framework allows developers to quickly build robust, secure ASP.NET Web Forms and XML Web service applications, Windows Forms applications, tools, and types. Find out all about its common language runtime and learn how to leverage its power to build, package, and deploy any kind of application or component. Read Chapter 9, ''ADO.NET XML Data Serialization.''




Building the Right Environment to Support AI, Machine Learning and Deep Learning

Serializing Filtered Views

As mentioned, in ADO.NET both the DataSet object and the DataTable object implement the ISerializable interface, thus making themselves accessible to any .NET Framework serializers. Only the DataSet object, however, exposes additional methods (for example, WriteXml) to let you explicitly save the contents to XML. We'll explore the various aspects of ADO.NET object serialization in the section "Binary Data Serialization," on page 422.

In the meantime, let's see how to extend the DataTable and DataView objects with the equivalent of a WriteXml method.

Serializing DataTable Objects

The .NET Framework does not allow you to save a stand-alone DataTable object to XML. (A stand-alone DataTable object is an object not included in any parent DataSet object.) Unlike the DataSet object, the DataTable object does not provide you with a WriteXml method. Nevertheless, when you persist a DataSet object to XML, any contained DataTable object is regularly rendered to XML. How is this possible?

The DataSet class includes internal methods that can be used to persist an individual DataTable object to XML. Unfortunately, these methods are not publicly available. Saving the contents of a stand-alone DataTable object to XML is not particularly difficult, however, and requires only one small trick.

The idea is that you create a temporary, empty DataSet object, add the table to it, and then serialize the DataSet object to XML. Here's some sample code:

public static 
void WriteDataTable(DataTable dt, string outputFile, XmlWriteMode mode)
    DataSet tmp = CreateTempDataSet(dt);
    tmp.WriteXml(outputFile, mode);

This code is excerpted from a sample class library that provides static methods to save DataTable and DataView objects to XML. Each method has several overloads and mimics as much as possible the DataSet object's WriteXml method. In the preceding sample code, the input DataTable object is incorporated in a temporary DataSet object that is then saved to a disk file. The following code creates the temporary DataSet object and adds the DataTable object to it:

private static DataSet CreateTempDataSet(DataTable dt)
    // Create a temporary DataSet
    DataSet ds = new DataSet("DataTable");
    // Make sure the DataTable does not already belong to a DataSet
    if (dt.DataSet == null)
    return ds;

Note that a DataTable object can't be linked to more than one DataSet object at a time. If a given DataTable object has a parent object, its DataSet property is not null. If the property is not null, the temporary DataSet object must be linked to an in-memory copy of the table.

The class library that contains the various WriteDataTable overloads is available in this book's sample files and is named AdoNetXmlSerializer. A client application uses the library as follows:

StringWriter writer = new StringWriter();
AdoNetXmlSerializer.WriteDataTable(m_data, writer);
// Show the serialization output
OutputText.Text = writer.ToString();

Figure 9-6 shows the sample application in action.

Figure 9-6 An application that passes some data to a DataTable object and then persists it to XML. (Image unavailable)

So much for DataTable objects. Let's see what you can do to serialize to XML the contents of an in-memory, possibly filtered, view.

Inside the DataView Object

The DataView class represents a customized view of a DataTable object. The relationship between DataTable and DataView objects is governed by the rules of a well-known design pattern: the document/view model. According to this model, the DataTable object acts as the document, and the DataView object acts as the view. At any moment, you can have multiple, different views of the same underlying data. More important, you can manage each view as an independent object with its own set of properties, methods, and events.

The view is implemented by maintaining a separate array with the indexes of the original rows that match the criteria set on the view. By default, the table view is unfiltered and contains all the records included in the table. By configuring the RowFilter and RowStateFilter properties, you can narrow the set of rows that fit into a particular view. Using the Sort property, you can apply a sort expression to the rows in the view. Figure 9-7 illustrates the internal architecture of the DataView object.

Figure 9-7 A DataView object maintains an index of the table rows that match the criteria. (Image unavailable)

When any of the filter properties is set, the DataView object gets from the underlying DataTable object an updated index of the rows that match the criteria. The index is a simple array of positions. No row objects are physically copied or referenced at this time.

Linking Tables and Views

The link between the DataTable object and the DataView object is typically established at creation time through the constructor, as shown here:

public DataView(DataTable table);

However, you could also create a new view and associate it with a table at a later time using the DataView object's Table property, as in the following example:

DataView dv = new DataView();
dv.Table = dataSet.Tables["Employees"];

You can also obtain a DataView object from any table. In fact, the DefaultView property of a DataTable object simply returns a DataView object initialized to work on that table, as shown here:

DataView dv = dt.DefaultView;

Originally, the view is unfiltered, and the index array contains as many elements as there are rows in the table.

Getting Views of Rows

The contents of a DataView object can be scrolled through a variety of programming interfaces, including collections, lists, and enumerators. The GetEnumerator method in particular ensures that you can walk your way through the records in the view using the familiar foreach statement.

The following code shows how to access all the rows that fit into the view:

(Code unavailable)

When client applications access a particular row in the view, the class expects to find it in an internal rows cache. If the rows cache is not empty, the specified row is returned to the caller via an intermediate DataRowView object. The DataRowView object is a wrapper for the DataRow object that contains the actual data. You access row data through the Row property. If the rows cache is empty, the DataView class fills it with an array of DataRowView objects, each of which references an original DataRow object. The rows cache can be empty either because it has not yet been used or because the sort expression or the filter string has been changed in the meantime.

Serializing DataView Objects

The AdoNetXmlSerializer class also provides overloaded methods to serialize a DataView object. You build a copy of the original DataTable object with all the rows (and only those rows) that match the view, as shown here:

public static 
void WriteDataView(DataView dv, string outputFile, XmlWriteMode mode)
    DataTable dt = CreateTempTable(dv);
    WriteDataTable(dt, outputFile, mode);

You create a temporary DataTable object and then serialize it to XML using the previously defined methods. The structure of the internal CreateTempTable routine is fairly simple, as shown here:

private static DataTable CreateTempTable(DataView dv)
    // Create a temporary DataTable with the same structure
    // as the original
    DataTable dt = dv.Table.Clone();
    // Fill the DataTable with all the rows in the view 
    foreach(DataRowView rowview in dv)
    return dt;

The ImportRow method creates a new row object in the context of the table. Like many other ADO.NET objects, the DataRow object can't be referenced by two container objects at the same time. Using ImportRow is logically equivalent to cloning the row and then adding the clone as a reference to the table. Figure 9-8 shows a DataView object saved to XML.

Figure 9-8 Saving a DataView object to XML. (Image unavailable)

Binary Data Serialization

There are basically two ways to serialize ADO.NET objects: using the object's own XML interface, and using .NET Framework data formatters. So far, we have reviewed the DataSet object's methods for serializing data to XML, and you've learned how to persist other objects like DataTable and DataView to XML. Let's look now at what's needed to serialize ADO.NET objects using the standard .NET Framework data formatters.

The big difference between methods like WriteXml and .NET Framework data formatters is that in the former case, the object itself controls its own serialization process. When .NET Framework data formatters are involved, any object can behave in one of two ways. The object can declare itself as serializable (using the Serializable attribute) and passively let the formatter extrapolate any significant information that needs to be serialized. This type of object serialization uses .NET Framework reflection to list all the properties that make up the state of an object.

The second behavior entails the object implementing the ISerializable interface, thus passing the formatters the data to be serialized. After this step, however, the object no longer controls the process. A class that neither is marked with the Serializable attribute nor implements the ISerializable interface can't be serialized. No ADO.NET class declares itself as serializable, and only DataSet and DataTable implement the ISerializable interface. For example, you can't serialize to any .NET Framework formatters a DataColumn or a DataRow object.

Ordinary .NET Framework Serialization

The .NET Framework comes with two predefined formatter objects defined in the System.Runtime.Serialization.Formatters namespace—the binary formatter and the SOAP formatter. The classes that provide these two serializers are BinaryFormatter and SoapFormatter. The former is more efficient, is faster, and produces more compact code. The latter is designed for interoperability and generates a SOAP-based description of the class that can be easily consumed on non-.NET platforms.

A formatter object is merely a class that implements the IFormatter interface to support the serialization of a graph of objects. The SoapFormatter and BinaryFormatter classes also implement the IRemotingFormatter interface to support remote procedure calls across AppDomains. No technical reasons prevent you from implementing custom formatters. In most cases, however, you only need to tweak the serialization process of a given class instead of creating an extension to the general serialization mechanism. Quite often, this objective can be reached simply by implementing the ISerializable interface.

The following code shows what's needed to serialize a DataTable object using a binary formatter:

BinaryFormatter bf = new BinaryFormatter();
StreamWriter swDat = new StreamWriter(outputFile);
bf.Serialize(swDat.BaseStream, dataTable);

The Serialize method causes the formatter to flush the contents of an object to a binary stream. The Deserialize method does the reverse—it reads from a previously created binary stream, rebuilds the object, and returns it to the caller, as shown here:

DataTable dt = new DataTable();
BinaryFormatter bf = new BinaryFormatter();
StreamReader sr = new StreamReader(sourceFile);
dt = (DataTable) bf.Deserialize(sr.BaseStream);  

When you run this code, something surprising happens. Have you ever tried to serialize a DataTable object, or a DataSet object, using the binary formatter? If so, you certainly got a binary file, but with a ton of XML in it. Unfortunately, XML data in serialized binary files only makes them huge, without the portability and readability advantages that XML normally offers. As a result, deserializing such files might take a while to complete—usually seconds.

There is an architectural reason for this odd behavior. The DataTable and DataSet classes implement the ISerializable interface, thus making themselves responsible for the data being serialized. The ISerializable interface consists of a single method—GetObjectData—whose output the formatter takes and flushes into the output stream.

Can you guess what happens next? By design, the DataTable and DataSet classes describe themselves to serializers using an XML DiffGram document. The binary formatter takes this rather long string and appends it to the stream. In this way, DataSet and DataTable objects are always remoted and transferred using XML—which is great. Unfortunately, if you are searching for a more compact representation of persisted tables, the ordinary .NET Framework run-time serialization for ADO.NET objects is not for you. Let's see how to work around it.

Custom Binary Serialization

To optimize the binary representation of a DataTable object (or a DataSet object), you have no other choice than mapping the class to an intermediate object whose serialization process is under your control. The entire operation is articulated into a few steps:

  1. Create a custom class, and mark it as serializable (or, alternatively, implement the ISerializable interface).
  2. Copy the key properties of the DataTable object to the members of the class. Which members you actually map is up to you. However, the list must certainly include the column names and types, plus the rows.
  3. Serialize this new class to the binary formatter, and when deserialization occurs, use the restored information to build a new instance of the DataTable object.

Let's analyze these steps in more detail.

Creating a Serializable Ghost Class

Assuming that you need to persist only columns and rows of a DataTable object, a ghost class can be quickly created. In the following example, this ghost class is named GhostDataTable:

public class GhostDataTable
    public GhostDataTable()
        colNames = new ArrayList();
        colTypes = new ArrayList();
        dataRows = new ArrayList();
    public ArrayList colNames;
    public ArrayList colTypes;
    public ArrayList dataRows;

This class consists of three, serializable ArrayList objects that contain column names, column types, and data rows.

The serialization process now involves the GhostDataTable class rather than the DataTable object, as shown here:

private void BinarySerialize(DataTable dt, string outputFile)
    BinaryFormatter bf = new BinaryFormatter();
    StreamWriter swBin = new StreamWriter(outputFile);
    // Instantiate and fill the worker class
    GhostDataTable ghost = new GhostDataTable(); 
    CreateTableGraph(dt, ghost);
    // Serialize the object
    bf.Serialize(swBin.BaseStream, ghost);

The key event here is how the DataTable object is mapped to the GhostDataTable class. The mapping takes place in the folds of the CreateTableGraph routine.

Mapping Table Information

The CreateTableGraph routine populates the colNames array with column names and the colTypes array with the names of the data types, as shown in the following code. The dataRows array is filled with an array that represents all the values in the row.

void CreateTableGraph(DataTable dt, GhostDataTable ghost)
    // Insert column information (names and types)
    foreach(DataColumn col in dt.Columns)
    // Insert rows information
    foreach(DataRow row in dt.Rows)

The DataRow object's ItemArray property is an array of objects. It turns out to be particularly handy, as it lets you handle the contents of the entire row as a single, monolithic piece of data. Internally, the get accessor of ItemArray is implemented as a simple loop that reads and stores one column after the next. The set accessor is even more valuable, because it automatically groups all the changes in a pair of BeginEdit/EndEdit calls and fires column-changed events as appropriate.

Sizing Up Serialized Data

The sample application shown in Figure 9-9 demonstrates that a DataTable object serialized using a ghost class can be up to 80 percent smaller than an identical object serialized the standard way.

Figure 9-9 The difference between ordinary and custom binary serialization. (Image unavailable)

In particular, consider the DataTable object resulting from the following query:

SELECT * FROM [Order Details]

The table contains five columns and 2155 records. It would take up half a megabyte if serialized to the binary formatter as a DataTable object. By using an intermediate ghost class, the size of the output is 83 percent less. Looking at things the other way round, the results of the standard serialization process is about 490 percent larger than the results you obtain using the ghost class.

Of course, not all cases give you such an impressive result. In all the tests I ran on the Northwind database, however, I got an average 60 percent reduction. The more the table content consists of numbers, the more space you save. The more BLOB fields you have, the less space you save. Try running the following query, in which photo is the BLOB field that contains an employee's picture:

SELECT photo FROM employees

The ratio of savings here is only 25 percent and represents the bottom end of the Northwind test results. Interestingly, if you add only a couple of traditional fields to the query, the ratio increases to 28 percent. The application shown in Figure 9-9 (included in this book's sample files) is a useful tool for fine-tuning the structure of the table and the queries for better serialization results.

Deserializing Data

Once the binary data has been deserialized, you hold an instance of the ghost class that must be transformed back into a usable DataTable object. Here's how the sample application accomplishes this:

DataTable BinaryDeserialize(string sourceFile)
    BinaryFormatter bf = new BinaryFormatter();
    StreamReader sr = new StreamReader(sourceFile);
    GhostDataTable ghost = 
        (GhostDataTable) bf.Deserialize(sr.BaseStream);  
    // Rebuild the DataTable object
    DataTable dt = new DataTable();
    // Add columns
    for(int i=0; i<ghost.colNames.Count; i++)
        DataColumn col = new DataColumn(ghost.colNames[i].ToString(), 
    // Add rows
    for(int i=0; i<ghost.dataRows.Count; i++)
        DataRow row = dt.NewRow();
        row.ItemArray = (object[]) ghost.dataRows[i];
    return dt;

The information stored in the ghost arrays is used to add columns and rows to a newly created DataTable object. Figure 9-9 demonstrates the perfect equivalence of the objects obtained by deserializing a DataTable and a ghost class.

The ghost class used in the preceding sample code serializes the minimal amount of information necessary to rebuild the DataTable object. You should add new properties to track other DataColumn or DataRow properties that are significant in your own application. Note that you can't simply serialize the DataColumn and DataRow objects as a whole because none of them is marked as serializable.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date