Back to the Real World
Other writers have described the problems with database concurrency in much more detail in numerous articles but these articles have one common flaw. Those writers view and discuss the whole problem from a database perspective instead of a user's perspective. What do I mean? Well, every explanation and proposed solution to database concurrency that I've read always focuses on the concurrency conflict and detection of a single whole row in a table. In the real world an update will often span multiple rows, tables, and sometimes even more than one database. However, when it comes to the way users see their data, this often doesn't map to single rows being updated.
I'll take the following examples from an accounting package but the principles apply to most, if not all, medium- to large-size applications.
A user sees data in this way:
Simple Reference Data
- Simple reference data. This kind of data usually maps to a single row. For example, you might have a list of VAT codes with the appropriate data.
- Complex application data. This typically consists of data that spans more than one row or table. Think of an invoice or a purchase order as an example.
- Regular application data. This kind of data is often stored in a single row in the database but the user actually considers it as a set of related but semi-distinct sets of data with different owners. An example of this would be article data. While an application might store all data about an article in a single row, users might divide the data based on who "owns" that part of the data.
- Basic article data. This might consist of a SKU and a description used by everyone.
- Sales information. The row might include a sales price, VAT information used when invoicing, and the number of units sold but not shipped. All this data is owned and updated by the sales department and read-only for others.
- Purchasing information. The row might include a preferred supplier, the maximum purchase price, and the number on order but not yet received. The purchasing department owns and maintains this data.
- Inventory. The row might include the number of items in stock. The warehousing department owns the inventory data and they will update the number of items to ship or receive as they process orders.
For most applications that handle simple reference data, the standard way that they handle data concurrency works very well because the set of data the user sees maps exactly to a row in the database. This means that this base is well covered and needs very little extra attention. Unfortunately this is also the kind of data that doesn't change very often so the chance of a concurrency issue occurring is very low.
Many applications do not handle complex data very well at all. Consider the case where two users both open the same order, each user changes a different line in the order and then they try to save the order. Technically speaking there is no conflict because each user is changing a different row in the database. However a user thinks about a single order and not about different rows in a database, so as far as he or she is concerned this is a multi-user conflict with two people changing the same piece of data. To some extent you can solve this problem procedurally in an organization. Also, some people might justly argue that the chance of the above scenario happening is very small. While that may or may not be true, sooner or later the order will move from the sales department to another department, maybe for approval or shipping. At this point you've involved two very different parts of the organization; one updates the detail lines and another takes ownership of the order by updating the header record. Again, technically speaking, you don't have a conflict because different users will update two different tables but in the real world this is a very real multi-user conflict.
To solve such a scenario you need to consider the separate rows in the database as a whole instead of individual items. You could change the way the application stores an order in the database and consider storing the data in a single row. Using an XML field in SQL Server 2005 you could store all details as XML elements. While this solution would solve the concurrency issue, this approach also has a number of drawbacks, most notably the lack of relational integrity checks between the order details in the XML and the rest of the database which would need to be done in code and cannot be done by the database itself.
Another approach for solving the complex data concurrency issues would be to check a row version of the order header row when updating the detail rows. This would basically work as follows:
- Start a database transaction.
- Check the row version of the order header row.
- Update the order header row. You must do this, even if you don't make an actual change to the order header, to update the row version.
- Update, delete or insert all order detail rows.
- Commit the transaction.
The problem here is that the database cannot really help you enforce this and there isn't another process that just updates an order detail row without updating the order header row version as well. To solve this problem you can update the order header row version using a trigger defined on the order detail table. This way you can always be sure that the header row version is updated but it does mean that the in-memory order header row must be refreshed if any of the detail rows have been updated. Again this is not an ideal situation but at the very least it's both very consistent and testable. You can test this by changing the same order twicethe second time directly after you've saved the first change. If you can save the second change than the row version in memory is the same as the row version in the database.
Regular Application Data
Most solutions don't handle this case very well either. The main problem is that both the database and the .NET class for working with data are based around a complete row. Both the tools inside Visual Studio and classes such as the SqlCommandBuilder are based around a SQL SELECT command and will generate a SQL UPDATE command for every field in the SQL SELECT. This is fine for simple reference data because a user typically updates the whole row, but in the case of regular application data, that doesn't make much sense.
Let's take a look at a simple example to see the shortcomings. Suppose that you store debtor information in a database. This information contains the address, the current balance, and the credit limit for a debtor. Your application automatically updates the balance every time a debtor pays a bill or places a new order and the user cannot update the balance directly, basically the balance is a calculated field that really exists because of performance reasons. One person in the customer relation department changes the address of the client while someone in the finance department changes the debtor's credit limit. Technically you have a concurrency conflict as both users want to update the same row at the same time. However the customer relation department owns the address data while the finance department owns the credit limit so procedurally there is no conflict. You cannot really solve the issue by giving each user only a subset of the data because the finance department needs to know where a debtor lives in order to make an assessment of their credit worthiness. The finance department doesn't need to update the debtor's address so they may only need read-only access to the address fields. The same is true the other way round; the customer relation department may need to be aware of the debtor's credit limit but no one from that department will ever update that information.
Even if both changes from this example get made by people that have permission to update all debtor information, you can hardly call this a concurrency conflict. Clearly a SQL UPDATE statement that simply includes all fields from a SQL SELECT isn't going to cut it here. You could solve the problem by updating each field as a separate entity. This would avoid and solve the problem described above but it overly simplifies the problem because certain fields, such as the complete address, are logically grouped together.