SQL Server 2005 Bug Alert: Merge Replication Could Mean Data Loss

ith the proliferation of mobile devices such as BlackBerrys, Tablet PCs, and regular laptops, some critical applications need to store a cache of their data locally at all times. Whether or not the application has a good connection to the corporate network should be transparent to the end user. Data should be available at all times, and end users expect to work the same way whether they are in the office or on the road without Internet access. Merge replication is the core of many of these intermittently connected applications.

Recognizing the importance of this technology, Microsoft published a detailed description of the breaking changes the SQL Server 2005 release introduced in replication. Some of these changes, however, are not well documented yet. One such omission involves using SQL Server 2005’s merge replication topology with row filtering: if the filters in place provoke an update at the publication and a deletion at the subscriber on the same row, the update on the publication never takes place. The result of this bug is data loss for both MSDE and SQL Server Express subscribers. The behavior is different?no data loss?when the publisher is a SQL Server 2000 Enterprise instance.

This article serves to alert the SQL Server community of the possible data loss that the changes in the SQL Server 2005 merge replication engine can cause. It uses a topology and test environment example to demonstrate how to reproduce the data loss. The intermittently connected application scenario is Brad Pitt and Angelina Jolie’s assistant taking grocery orders on her Tablet PC, connecting the Tablet PC to the network when she’s at the “Brangelina” home, and then taking all the orders with her when she goes to the supermarket.

Description of the Environment
The topology example uses SQL Server 2005 Developer Edition with SP2 as publisher and distributor and SQL Server Express with SP2 instances as subscribers. The operating system in this case is not relevant, but the test environment runs Microsoft Windows 2000 and the subscribers run Windows XP Professional Edition.

The publisher database uses the following SQL Server 2005:

Microsoft SQL Server 2005 – 9.00.3050.00 (Intel X86) Mar 2 2007 20:01:28 Copyright (c) 1988-2005 Microsoft Corporation Developer Edition on Windows NT 5.2 (Build 3790: Service Pack 2)

The subscriber database uses the following SQL Server 2005:

Microsoft SQL Server 2005 – 9.00.3042.00 (Intel X86) Feb 9 2007 22:47:07 Copyright (c) 1988-2005 Microsoft Corporation Express Edition on Windows NT 5.1 (Build 2600: Service Pack 2)

The merge replication is set up as a pull subscription (agent runs at the subscribers). The example uses the Windows XP Synchronize Tool to start the merge agent. The publication compatibility level is 80RTM to keep backward compatibility with MSDE subscribers.

The example also has row filtering in place to reduce the amount of rows replicated to each subscriber. In a real-world scenario, each end user works with its own set of data only. The example filters the data that is sent to each subscriber based on the ownership of this data. Data partitions are not implemented in this scenario.

Steps to Reproduce the Data Loss
The following steps walk you through reproducing the SQL Server 2005 data loss and are interjected with real-world examples from the Brangelina assistant scenario:

  1. Create the publication database. Find the script for creating the publication database in the code download.
  2. Create the tables inside the database. This script is also provided in the code download. The database’s design is simplistic (see Figure 1) and suitable only for demonstration purposes.

    Click to enlarge

    The assistant now reviews the orders on her client application and changes the status to “Open” ( OrderStatus=1). The data she now has on the subscriber looks like you see in Figure 5. She then rushes to the closest supermarket to buy the merchandise.

    Click to enlarge

    The publisher database at Brangelina’s house looks like you see in Figure 8 and Figure 9.

    Click to enlarge

    Figure 8. Order Table on Brangelina’s Publisher Database:
    The Order table looks like this on the Brangelina publisher database.
    ? Click to enlarge

    Figure 9. OrderDetail Table on Brangelina’s Publisher Database:
    The OrderDetail table looks like this on the Brangelina publisher database.

    The assistant now wants to inform Brad Pitt that the first order (order some pizza for the kids) has been fulfilled. She synchronizes her client application with the publication database, to which Brad has select access. (This over-simplified scenario uses Windows XP synchronization again.) The client application will not notify the assistant of these conflicts (after all, the example is set to have the server win in every situation).

  3. Replicate again (see Figure 10).

    Click to enlarge

    The data on the subscriber database looks like you see in Figure 13 and Figure 14.

    Click to enlarge

    Figure 13. Order Table on the Subscriber Database:
    The Order table looks like this on the Brangelina subscriber database.
    ? Click to enlarge

    Figure 14. OrderDetails Table on the Subscriber Database:
    The OrderDetails table looks like this on the Brangelina subscriber database. The row filtering took place as expected.

    The assistant calls the DBA and informs him that Brangelina’s family cannot see the order details, and this had never happened before the server upgrade. The DBA checks for conflicts at the publication and finds one (see Figure 15 and Figure 16).

    Click to enlarge

    Figure 15. The Conflict Viewer Interface of the Conflict:
    The conflict is recorded in the conflict viewer.
    ? Click to enlarge

    Figure 16. The Subscriber Is Selected as the Loser:
    The updates on that row never take place at the publisher.

    The DBA tried to set up an MSDE subscriber, and the result was very discouraging: the data loss continued, but the conflict was not recorded at the publication. He decides to resolve this situation manually by keeping the subscribers with SQL Server Express and submitting the conflict loser. He finds it odd that he never saw this conflict recorded before when he had the SQL Server 2000 Enterprise edition as publisher and the MSDE client as subscriber. Data is not lost for now, but he might have to develop a custom conflict resolver.

    The Brangelina family is now happy because they see the pizzas were purchased at a fully organic place, while the DBA is working full time on a temporary workaround and a final custom conflict resolver.

    Update-Delete Operations at the Publisher
    The behavior of update-delete operations at the publisher differs greatly from SQL Server 2000 to SQL Server 2005. Microsoft implemented this behavior change by design. The Microsoft SQL Server 2005 Books Online example offers the following explanation:

    The Merge Agent detects update-delete conflicts when an update of data at one node conflicts with a delete at another. In this case, the Merge Agent updates a row; however, when the Merge Agent searches for that row at the destination, it cannot find the row because it has been deleted. If the winner is the node that updated the row, the delete at the losing node is discarded and the Merge Agent sends the newly updated row to the conflict loser. The Merge Agent logs information about the losing version of the row to the MSmerge_conflicts_info table.

    The Brangelina assistant scenario demonstrated how this behavior could occur in the real world:

    1. Deletion takes place at the subscriber as expected.
    2. When the merge engine tries to delete this row at the publication, a conflict is reported.
    3. The row is then rolled back to its original state, losing all the updates that occurred at the subscriber.
Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

Recent Articles: