DO.NET has a strong and powerful disconnected model that lets programmers build a Web of in-memory objects and relate contents coming from different tables and even from different data sources. It’s important to think through your code strategy carefully when processing queries and updates on interrelated tables to preserve scalability and maintain high performance. Sometimes, it’s much more effective to split compound queries; sometimes it isn’t. Submitting changes in batch mode, exploiting the DataSet and its disconnected model, may be the perfect solution. However, when you need to move dozens of megabytes, the DataSet’s serialization mechanism increase the size of the data considerably. In ADO.NET, the magic word is “tradeoffs.”
All data-driven applications look alike. They are built around a few basic tasks?fetch data across different tables, join columns, and filter data based on some user-specific criteria. Data-driven applications must also format data in such a way that end-users can easily and pleasantly consume it. More often than not, data is made of cross-related tables in which parent/child relationships are in place to define the data model and to extract a specific subset of information like detail views and subtotals.
So can we conclude that all applications that deal with data are the same and can be built using a common pattern? Of course not. Other aspects, including scalability and performance, affect and influence the design.
All distributed applications must be scalable to some extent. If a Web application is made open to anonymous users?for example, a shopping site?providing a good, if not optimal, level of scalability is a sort of survival test. Scalability is related to performance, but it can’t be identified exclusively with that. Likewise, performance affects scalability but in no case are they synonyms.
In general, caching data does not seem to be an optimal choice for all Web applications because it consumes valuable resources on a per-user or per-application basis. As a result, having less memory available to the Web server might lead to performance degradation. On the other hand, if you’re developing an application that needs to be both fast and scalable, you should carefully consider caching data at some level to reduce the workload on middle-tier components and the bottleneck of database connections. The more tiered your system is, the more options you have to scale it out as long as you can move disconnected caches of data across the tiers.
In the first part of this article, I examined a few recommended approaches for connections and transactions management in ADO.NET applications. In this second installment, I’ll tackle SQL-level programming and review a few scenarios in which the quality and the structure of SQL statements can make a world of difference between applications.
Untangle the Database Ball of Wool
Even the simplest Web application can have a rather complex and interrelated structure of databases. While a database with dozens of related tables, views, constraints, and triggers is perhaps not what you deal with every day, I’ve never seen a real-world application with less than a handful of related tables.
The problem with a grid of databases is with both queries and updates. End users should be able to access and filter related rows, but also to update and delete rows in a way that does not violate the consistency and integrity of the database.
Some databases resort to server-side constraints to block undesired changes. The fact that most database administrators use this strategy to stay on the safe side is not a good reason to avoid code that could potentially corrupt the database. Ideally, client applications should submit changes that respect the database structure and its internal update rules (for example, relationships and constraints).
ADO.NET has a strong and powerful disconnected model that allows applications to work with sort of in-memory databases. The DataSet object governs the storage of multiple tables and it enables filtering, sorting, constraints, and cascading changes. This model, though, seems to fit more easily for updates than for queries. For compound queries, the hurdle is the way in which an SQL result set is adapted to fit into a DataSet and whether cached data is effective to the application. Let’s review a couple of solutions for compound queries.
Conducting Multiple Queries
In light of the ADO.NET capabilities, you can implement multiple queries that span across multiple tables in at least two ways. The old-fashioned way, so to speak, is typical of ADO and any former data-access technology. It is based on the old faithful SQL JOIN command and returns a single, monolithic data structure that is quite easy to filter, but hard to update. In addition, the final result set is populated with redundant data.
The alternative approach, touted by the ADO.NET disconnected model, exploits the capabilities of the DataSet object?that is, a multi-table cache of data. The idea is that you split the original query into two or more distinct queries and gather the results in separate tables held in memory within the same DataSet object. The queries can be run individually or grouped in a stored procedure or in a batch. Using split queries results in a much more compact data set that can be easily managed via the programming interface of the DataSet and DataRelation objects.
Once the result set is stored into a DataSet, it supports cascading changes and batch updates. Furthermore, no synchronization code is required to retrieve the children of a parent row because automatic relationships can be declared. On the downside, just relationships make applying filters to the records not particularly easy.
Suppose you need to obtain all orders issued to a given country. The query can be written like this:
SELECT c.customerid, c.companyname, c.city, o.orderid, o.orderdate, od.productid FROM customers c INNER JOIN orders o ON c.customerid=o.customerid INNER JOIN [order details] od ON o.orderid=od.orderid WHERE c.country = 'USA' ORDER BY o.customerid
The query spans three Northwind tables?Orders, Order Details, and Customers. It retrieves all orders issued to a customer resident in the US. Viewed through the Microsoft SQL Server Query Analyzer, the result set looks like Figure 1.
|Splitting Monolithic Queries
Suppose that you are not allowed to use multiple queries in your application. This has nothing to do with a bad opinion you may have of the solution. It is just that your DBA strongly disagrees with any suggested change in the set of stored procedures. You would like to manage data using comfortable ADO.NET objects, but your DBA will only let you employ old-style stored procedures to get them. How can you load the result sets into ADO.NET disconnected objects?
Let’s consider the following more complex query. It retrieves, grouped by customer, all orders issued in a given year. The query won’t retrieve all the orders, though, but only those with at least 30 items. In addition, the name of the employee who processed the order and the name of the company that issued it are inserted into the result set.
Listing 1 demonstrates how to split the results of the query above. The main procedure?SplitData?gets the DataSet originated by the query and processes it. The input DataSet contains just one table.
The method in Listing 1 makes a copy of the DataSet and then creates two new tables?Employees and Customers. The SplitData method fills these two tables up using data from the monolithic table. Finally, the method obtains the third table?Order Details?and removes processed columns from the original table. As a result, you run a traditional stored procedure, pay the price of redundant data being moved across the wire, but gain in flexibility because you can handle disconnected data in distinct and possibly related tables. What’s the big advantage of this approach? You can now make batch updates.
Conducting Cross-Table Updates
When related tables are involved, changes flow from the parent to the children in different ways, depending on the type of the change you’re attempting?update, delete, or insert.
In general, there are two ways of conducting table updates?through a direct connection or batch update. In the case of a direct connection, you typically open a connection and execute the stored procedure or the SQL batch with all the logic and data you need to apply. In the case of a batch update, you typically yield to a middle-tier component, which gets disconnected data from you and takes care of submitting data to the database, possibly in an asynchronous way.
The disconnected approach in a Web scenario requires you to serialize the data that you want to submit from the Web server layer down to an internal tier, which holds a physical connection to the database. The availability of an easy-to-serialize object such as the DataSet and the powerful batch update mechanism of data adapters give you the tools to build highly scalable applications.
The batch update process passes all in-memory changes recorded in a DataSet table to the back-end system for permanent storage. In ADO.NET, you don’t submit your blocks of data being sent to the database in a single shot. Your ADO.NET batch update will execute individual statements on the target system, one for each change that needs to be submitted. Batch update opens one connection but still performs multiple statements in a predefined order.
To run a batch update, you don’t have to do much more than call the Update method of the data adapter class. This simple mechanism, though, won’t work at all with inter-related, hierarchical data. When you deal with hierarchical data, you need client-side cascading changes and custom update logic to build effective applications.
ADO.NET supports two types of constraints?unique and foreign-key constraints. A unique constraint is a restriction on a data column in which all values must be unique. A foreign-key constraint set on a DataTable restricts the action performed when you try to delete or update a value in a column. For example, deleting a value from the parent table can affect rows in the child table in various ways. Often you want to delete child rows in the child table. However, in some cases, you’ll have a cascading action set the child rows to null or default values. You can programmatically set the action that works best for using the programming interface of the ForeignKeyConstraint class.
What really matters, though, is that ADO.NET provides you with the tools to configure the runtime environment to automatically cascade changes from parent to child rows. Such a mechanism allows you to work on a DataSet object, enter changes, and (if needed), have them propagated through the tree. When you have gone through all the changes, ADO.NET guarantees that all the tables in the DataSet have been properly updated. If a change cannot be cascaded, or simply contains invalid values that would violate any of the relationships, an exception would be thrown.
After ADO.NET has updated your tables, you just take the resulting DataSet and apply the logic that best suits your updates. For example, depending on the relationships set between your tables, you may need to process child deleted rows before parent insertions and child updates after parent updates. You can easily extract the subset of rows to process using the Select method on the DataTable object. Whatever the right order is to enter changes to your application, you have the tools to get ready-to-process rows.
Retrieving Server-Generated Values
The UpdatedRowSource property looks for output parameters, the first row of the next query, both, or neither. By default, it looks for both. The following SQL batch inserts a new row into a database where custID is an identity column. The second statement retrieves the last identity value generated in the scope and returns it as the updated value of the custID column.
If you assign the above statements to the InsertCommand property of a data adapter, the UpdatedRowSource property will catch the newly generated identity value and pass it to the DataRow object being updated. If you set the UpdatedRowSource property to Both (the default) or FirstReturnedRecord, each inserted row processed through the batch update will have the identity column updated. And, more importantly, you don’t have to write any additional code. You only need to select a checkbox in the Visual Studio .NET wizard that configures a data adapter.
A data adapter that is about to update a row during a batch update process fires the RowUpdating and RowUpdated events, respectively, before and after the update. You handle the RowUpdated event as follows:
The event handler gets a data structure of type SqlRowUpdatedEventArgs in which the Row property returns the row as updated by the underlying database. You’ll need to modify this code database. You must design effective queries and update strategies to make a scalable application. To write effective SQL code, you should know the SQL engine you’re using, and realize that they may not be equal. If you’re building a .NET application, you also need a good understanding of the ADO.NET model and its capabilities. ADO.NET is powerful but it doesn’t reduce database programming to mere point-and-click programming. ADO.NET offers a set of powerful tools but without practice and experience you may not do a lot with them. The command builder object or even the batch update process, for example, greatly simplifies the update process but you must know what they do and how they work.