Home » Synchronize Your Databases with .NET Web Services (Part II)

Synchronize Your Databases with .NET Web Services (Part II)

n the first part of this article series, you saw how to expose data via an ASP.NET Web service, in an XML format useful for all kinds of clients. By exposing a .NET DataSet object, you effectively deliver a standard format XML document containing a DataSet represented as a diffgram. This diffgram can be used to reconstruct an exact copy of the original DataSet?complete with tables, columns, data types, metadata, primary, and foreign keys, and even table relationships.

An Overview of the Synchronization Process
This article expands on the principles discussed in the previous article to build a utility that synchronizes the contents of two or more databases over the Internet, delivering to each the rows that have been added to any other. The principle is relatively simple:

You can assume that:

All the tables to be synchronized exist in each database, and those tables have identical structures across all the databases.
Each table has a column containing a value that identifies its parent database or machine, so that rows can be identified depending on which machine they originally came from.
Each machine exposes the same Web service that can deliver a DataSet containing new rows to any of the other machines for synchronization.
Each machine calls this Web service on each of the other machines or a regular basis, fetching new rows from each one and adding them to its own database.

The schematic in Figure 1 shows the overall process for two machines. Note that the process must avoid collecting all the new rows from the other machine(s). By specifying the value for the column that identifies each machine when querying the Web service, we ensure that only the new rows that have been added to that specific machine are returned. Without this restriction, the first machine would copy the new rows from the second machine, and then?when the second machine fetches new rows from the first machine?it would also collect the rows it previously sent to the first machine, causing duplication of the new rows.

The Synchronization Example Database
Assume the sample Web sites?maintain logs of traffic using both IIS logging (via ODBC to a database), and custom logging of ASP and ASP.NET session starts, storing details such as the browser’s user agent string, language setting, and referrer. This data is summarized once a week, using stored procedures within the database to populate six tables with the totals for each type of data collected.

Figure 2 shows the six tables, plus an extra table named LangCodes that provides a text equivalent of the language code that the browser exposes, for example “en-us” which represents “English (United States)”.

The Synchronization Web Service
The Web service used for this example is similar in concept and outline to that discussed in Part I of this article series. The class file (named getlogstable.asmx in the winforms-appWebService folder of the sample code) has the following format:

   <%@WebService Language="VB" Class="SynchIISLogs"%>      Imports System   Imports System.Data   Imports System.Data.SqlClient   Imports System.Web   Imports System.Web.Services   Imports System.Configuration       Public Class SynchIISLogs        ... class implementation here ...      End Class

Deciding How to Return the Data
The sample Web service in Part I of this article returned a DataSet containing only a single table. However, you can add more than one table to the DataSet before returning it to the client application. There are downsides to this approach, though?it means that if an update fails you have shifted a lot of data that may not be used. By fetching each table separately and halting the process when the first failure occurs, you reduce the bandwidth requirements. Using smaller DataSets also minimizes the risk of time-outs in the Web service.

On the other hand, partial updates can cause data errors. If you copy rows from one table, but then an error occurs in another table (or in rows encountered later in the first table), the target database will not contain a valid set of rows. The ideal solution to this is to use a connection-based transaction in the client application, so that you can roll back all the updates to all the tables if any error occurs. But that means keeping the connection open throughout the entire process, rather than closing it after fetching each individual table’s rows and inserting them into the database.

The end result is a compromise that can use multiple tables, but doesn’t have to. The sample Web service can accept a comma-delimited string of table names. By parsing that string and running queries, the Web service builds and returns a single DataSet containing the appropriate rows from each specified table. If the request string contains only one table name, the code fetches and returns only that table in the DataSet.

Allowing clients to collect all the rows for all the tables they need to synchronize in one request, makes it possible to build efficient client-side applications that can hold the connection open and perform all the updates to all the tables within a transaction. Then, after all the updates have succeeded, they can either commit or roll back all the modifications.

Building the SQL Statement
As the code in the GetLogTables method iterates through the list of tables, it uses a separate private function to create the SQL statement that extracts the required set of rows from the database. The GetSQLStatement function takes three parameters (the host IP address, the name of the table currently being processed, and the date that the last synchronization took place). Note that the method uses a GregorianCalendar instance to obtain the week number from the DateTime value passed as the last synchronization date. This is required because some of the tables contain a year and week number instead of a datetime column:

   Private Function GetSQLStatement(sHostIP As String, _      sTableName As String, _      dLastUpdate As Date) As String         Dim oCal As New System.Globalization.GregorianCalendar      Dim iYear As Integer = dLastUpdate.Year      Dim iWeek As Integer = oCal.GetWeekOfYear(dLastUpdate,          Nothing, Nothing)         Dim sSQL As String = "SELECT * FROM " & sTableName & _         " WHERE "      If sTableName.ToLower() = "pagesbyweeksummary" _         Or sTableName.ToLower() = "trafficbyhoursummary" Then         sSQL &= "DATEDIFF(day, '" &             dLastUpdate.ToString("yyyy-MM-dd") & _            "', TSumDate) > 0 AND TSiteIP = '" & sHostIP & "'"      Else         sSQL &= "((TYearNumber = " & iYear.ToString() _            & " AND TWeekNumber > " & iWeek.ToString() _            & ") OR (TYearNumber > " & iYear.ToString() _            & ")) AND HostIP = '" & sHostIP _            & "' ORDER BY TYearNumber, TWeekNumber"      End If         Return sSQL      End Function

The GetSQLStatement method calls the GetWeekOfYear method, which takes three parameters: the date to work from, a CalendarWeekRule value (which indicates how the first week of a new year should be determined), and a DayOfWeek value (which determines what day is the first each week). Passing the value Nothing (null in C#) for these two parameters uses the defaults determined by the culture and internationalization settings of the machine, which should match the values used by ASP.NET and the database server software. For more details check out the GregorianCalendar class in the System.Globalization namespace section of the .NET SDK Reference section.

Returning Error Details
Another private function in the Web service class creates a DataSet containing error details, if any process within the service throws an exception. This routine is identical to the one shown in Part I of this article series. You’ll see how a client application uses this “error” DataSet later in this article:

   Private Function GetErrorDataSet(sMessage As String)       As DataSet         Try         Dim oDT As New DataTable("Errors")         oDT.Columns.Add("Message",             System.Type.GetType("System.String"))         Dim oDR As DataRow = oDT.NewRow()         oDR("Message") = sMessage         oDT.Rows.Add(oDR)         Dim oDS As New DataSet("Error")         oDS.Tables.Add(oDT)         Return oDS      Catch         Return Nothing      End Try      End Function

Figure 5 shows the test page that ASP.NET generates automatically for the GetLogTables method when you select it in the main service description page (which was shown in Figure 4).

The TestSQLStatement Method
This second public Web method is actually a debugging tool left in place to help when testing the service and client application. The method simply returns a string containing the SQL statement to use when calling the Web service to extract the data with the specified set of parameter values. Internally, it calls the GetSQLStatement function shown earlier with three parameters. Note that this method accepts only a single table name:

    Public Function TestSQLStatement( _      sHostIP As String, _      sTableName As String, _      dFromDate As Date) As String           Return GetSQLStatement(sHostIP, sTableName, dFromDate)      End Function

Of course, you could just expose the GetSQLStatement function as a Web method directly, instead of creating this “wrapper” call to it. However, after testing, you’ll probably want to remove this public method, and you can do this just by deleting the wrapper Web method rather than having to edit the declarations of the GetSQLStatement function.

Figure 6 shows the result of calling the TestSQLStatement method. Because the table used for the query contains columns that hold year and week numbers rather than a single date field, the GetSQLStatement function generates a WHERE clause that specifies the appropriate year and week values:

The Synchronization Process in Overview
If you look back at the sample configuration file, you’ll see that the other entries govern what the application must achieve:

The SourceWebServiceURLs value is a comma-delimited string containing the URLs of the Web services on the machines with which this machine will synchronize.
The SourceWebSiteIPAddresses value is a corresponding comma-delimited list of the key values in the tables of the databases on the machines with which this machine will synchronize. For example, the first value in the SourceWebServiceURLs list corresponds to the first key in the SourceWebSiteIPAddresses list.
The SourceTableList value is just a comma-delimited list of the tables to synchronize.

Given these values, the synchronization process then follows the pattern shown in the schematic in Figure 8. It iterates through the list of Web service URLs fetching a Dataset from each that contains all the tables for synchronizing, attaches a DataAdapter, and then uses the Update method to iterate through the tables inserting any new rows into the local database.

Referencing and Using a Web Service in Visual Studio
After creating a new Web Forms project in Visual Studio .NET, and building the interface Form, we get to the point where we have to consider how we are going to access a Web service on the remote machines from where we’ll be collecting new rows for our database. The technique of using the WSDL.exe utility to create a proxy class, demonstrated in Part I of this article would work here, as would the technique of simply loading the Web service response as an XML document. However, Visual Studio .NET makes it far easier to work with Web services.


Figure 11. Adding Web References: The figure shows major steps in the process to add a Web Reference to a project in Visual Studio .NET.

To add a reference to the Web service to the client project, right-click the Web Reference entry in the Solution Explorer window and select Add Web Reference. As shown in Figure 11, this opens a dialog where you can search for Web services, see the default service pages that they expose, and view their contracts (the WSDL that defines the interface). The Web service you want to use is the getlogtables.asmx file described earlier in this article. This service resides in a subfolder of the local machine’s default Web site, and you can navigate to it by entering its Web address or by using the right-hand pane of the dialog?which is basically just a browser window.

When you click the Add Reference button to add the chosen Web service reference to your project, Visual Studio fetches the WSDL file and builds a proxy for it?just as when you use the WSDL utility directly. You can see this proxy file (named Reference.vb) in the folder that VS.NET creates under Web References (see Figure 12). You’ll also see a .disco (discovery) file, and other files that VS itself requires.

The Synchronization Code
The ProcessAllSynchOperations method’s task is to implement the process shown in the schematic in Figure 8. Before it gets called, a couple of other routines (not shown here) collect the values from the configuration file. These routines split the list of Web service URLs (the Web sites from which to collect new rows) into an array named gSourceURLArray; split the list of “key” values? (the matching IP addresses of these sites)? into an array named gSiteIPArray; and split the list of table names into an array named gTableArray.

Code involved only in writing status information to the Form window and the log file has been removed from this section.

Iterating Through the Web Service URLs
The ProcessAllSynchOperations method iterates through the gSourceURLArray array of Web service URLs using an Integer index to extract the matching key value from the gSiteIPArray. A separate function reads the file containing the last successful update date for that site. The URL, key, and date information are then used in the call to each site’s Web service. If all goes well you get back a DataSet containing all the new rows from all the tables for that site; otherwise you get a DataSet containing only a single table named Errors, or Nothing if the remote Web service fails completely. The method checks for both these conditions. Notice how it extracts the error message if the service returns an “error” DataSet. The method throws an exception that is caught in the main routine within the application, which displays and logs the message and then halts execution.

   Sub ProcessAllSynchOperations()         ' iterate through all the Web service URLs      For iLoop = 0 To gSourceURLArray.Length - 1            ' get values for this source site         sSourceURL = gSourceURLArray(iLoop).Trim()         sWebSiteIP = gSiteIPArray(iLoop).Trim()         dLastUpdate = GetLastDateLoaded(sWebSiteIP)            ' call routine to fetch the DataSet from          ' the remote Web service         Dim oDS As DataSet = FetchTables(sSourceURL,             sWebSiteIP, gTableNames, dLastUpdate)            ' see if there were any errors with the Web service         If oDS Is Nothing Then            Throw New Exception( _               "No DataSet returned from Web service")         End If         If oDS.Tables(0).TableName = "Errors" Then           Throw New Exception(              "Error reported by Web service: " & _              oDS.Tables(0).Rows(0)(0))         End If         ...

Iterating Through the List of Tables
At this point, you know that the service returned a DataSet containing any new rows, so you begin to insert the rows into the local database. You need to create a connection to the local database, open it, and start a transaction over this connection, iterating through the list of tables held in gTableArray. If no table in the DataSet matches the current item in the gTableArray list, (there were no rows added on the source machine), the application simply logs and displays a message. However, if the table does exist, it checks to see if there are any new rows in it using the Count property of the Rows collection:

       ...       Dim oConn As SqlConnection       Dim oTrans As SqlTransaction       Try            ' create Connection, open it and start a transaction         oConn = New SqlConnection(gConnStr)         oConn.Open()         oTrans = oConn.BeginTransaction()            ' iterate through all the tables in the list         Dim sTableName As String         For Each sTableName In gTableArray           ' check if table is in DataSet --            ' might not be if there were no new rows           If oDS.Tables(sTableName) Is Nothing Then              WriteStatus(">> WARNING: no rows received " & _              "for table '" & sTableName & "'")           Else              ' see how many rows are in this table              Dim iRows As Integer =                  oDS.Tables(sTableName).Rows.Count              If iRows > 0 Then                 Try                    ...

Inserting New Rows into the Database
To insert the rows automatically, you must attach a DataAdapter to the DataSet, and call its Update method. However, you also need to ensure that this happens within the control of the currently pending transaction, which involves some extra code. Normally, you create a DataAdapter by specifying the connection and a SQL statement that selects an appropriate set of rows:

   Dim oDA As New SqlDataAdapter("sql-statement", connection-instance)

However, calling that constructor only sets the SelectCommand property of the DataAdapter. To create a suitable InsertCommand, you can add a CommandBuilder:

   Dim oCB As New SqlCommandBuilder(oDA)

Then, when you call the DataAdapter’s Update method, the CommandBuilder generates a suitable parameterized SQL INSERT statement, and assigns it to the CommandText property of the Command object associated with the DataAdapter’s InsertCommand property. It also populates the Parameters collection of that Command with the appropriate set of Parameter instances. If you choose not to use a CommandBuilder, you’ll have to do all this work manually.

However, there’s a problem; the Command objects that the CommandBuilder creates are not added to the pending transaction, and trying to assign the transaction to them afterwards causes an error. To get around this, you have to specify the current pending transaction when you create a new Command object. Subsequently, you can pass that Command to the constructor of the DataAdapter where it will be used as the SelectCommand.

Next, you can use the GetInsertCommand method of a new CommandBuilder to obtain a Command object, assign the pending transaction to it, and then assign it to the InsertCommand of the DataAdapter. Finally, you’re ready to call the Update method to push the rows into the database:

        ...        ' create a new Command for the SelectCommand         ' within the current transaction        Dim oSCmd As New SqlCommand( _           "SELECT * FROM " & sTableName, oConn, oTrans)           ' create DataAdapter using that Command        Dim oDA As New SqlDataAdapter(oSCmd)           ' create a new CommandBuilder from the DataAdapter        Dim oCB As New SqlCommandBuilder(oDA)           ' get the InsertCommand Command instance         ' from the CommandBuilder        Dim oICmd As SqlCommand = oCB.GetInsertCommand()           ' attach the current transaction to the InsertCommand        oICmd.Transaction = oTrans           ' specify this as the InsertCommand of the DataAdapter        oDA.InsertCommand = oICmd           ' update the database table        Dim iCount As Integer = oDA.Update(oDS, sTableName)        ...

Committing or Rolling Back the Transaction
If the update succeeds, you can move on to repeat the process for the next table. If an error occurs, the application throws an exception that halts execution, calling the Rollback method of the current transaction first so that no updates will be applied to any of the tables. However, after successfully processing all the tables in the list, you can safely attempt to commit the transaction. The enclosing Try…Catch construct also rolls back all updates if any other error occurs. The Finally section ensures that the connection gets closed when the process completes (it also executes if an error occurs, before raising the exception to the calling routine).

            ...            Catch e As Exception                  ' error encountered so roll back all the updates               oTrans.Rollback()               Throw New Exception( _                  "Error updating target table " _                  & sTableName & " - " & e.Message)            End Try         End If      End If      Next   ' process next table in list      oTrans.Commit()  ' all OK so commit all of the updates      Catch e As Exception         ' error encountered so roll back all the updates      oTrans.Rollback()         ' and stop, though could just write error message       ' and process next URL           Throw New Exception( _         "Transaction failed to complete - " & e.Message)      Finally      oConn.Close()      End Try   ...

Updating the “Last Updated” File
The ProcessAllSynchOperations routine ends by calling a separate routine named UpdateLastDateLoaded, which just writes the current date and time into the “last updated” file for this Web service URL (you’ll recall that this is a file with a name such as 217_45_1_242.txt). Then the code returns to process the next Web service URL in the list:

         ...         ' update last synchronization date file for this site         UpdateLastDateLoaded(sWebSiteIP)         Next   ' process next site (next Web service URL)      End Sub

The synchronization is complete for this client after it processes all the Web service URLs in the configuration list. Meanwhile, all the other machines within the synchronization group will carry out the same process, fetching new rows from each other and adding them to their local databases. You should schedule each machine for a slightly different time, so that they aren’t all trying to synchronize at the same time.

Extending the Sample Application
It’s generally the case that examples you see in articles and books never seem to be exactly what you need for your own applications. So, this section provides some pointers to how you might extend the example for your own needs. There are two obvious areas to consider:

Maintaining a centralized set of configuration data, making administration easier and ensuring each machine uses the same configuration data.
Synchronizing changes to the data on each machine, rather than just adding new rows.

Centralizing Configuration Settings
Each client instance in the example shown in this article uses a local configuration file that lists the other members of the synchronization group. To implement a centralized configuration system, you could store the configuration information in one specific location, and then fetch it from each client as they start the synchronization process. As the configuration information file is in XML format, it would be easy to load this across the network (or across the Internet) using a .NET XmlDocument or XmlTextReader object. Then you would extract the data from the XML document rather than from a local .config file, or simply write the XML directly to disk to be used as the local configuration file.

However, in the example shown earlier, each client’s local configuration file lists only the other members of the configuration group. To implement a single configuration file that is used by all clients, you would have to include all the members of the synchronization group (all the Web service URLs and machine-specific key values), and then make sure that each client ignores it’s own entry as it iterates through the list of Web service URLs. If the machine-specific key is the IP address of the client (as in our example), the client can detect this and simply skip the entry that specifies its own IP address. If you use a different value for the machine-specific keys, you could store this is a separate local file on each client, which it would read and use to identify which of the Web service URLs in the list applies to the local machine (and therefore should be skipped).

Synchronizing Changes to the Data
The example discussed in this article is aimed at synchronizing data by adding only new rows to all the databases within the synchronization group. However, what about synchronizing changes to existing rows in the databases?

This is much harder to achieve, as it means that the process must be able to handle updates to existing rows and detect when rows have been deleted. This also brings into play issues such as which of the group of clients will hold the “master” database (in case there are conflicting updates), and what the rules are for handling updated and deleted rows. For example, if the same existing row is changed in more than one database, which version of the row should be retained and replicated to the rest? And if a row is deleted in one database, should it be deleted from all the others? Even more interesting, how would you detect that a row had been deleted anyway?

The DataAdapter object’s Update method is clever in that it will automatically push changes to existing rows into a table in a database, as well as adding new rows. It will even delete rows in the table that have been deleted from the DataSet. The problem is that you have to populate the DataSet in such a way that it contains the appropriate set of updated and deleted rows (in other words, the current and original values of each column in each row and their RowState values have to be set correctly to be able to push the changes into the target table).

One way to handle updates would be to create a DataSet containing the existing rows in the local database on the client machine, then send this DataSet to the target server where the DataAdapter object’s Fill method would be used to refresh the rows in the DataSet with the updated values in that database (which would also collect any added rows). Then the DataSet would be sent back to the client where the Update method would push the changes into the local database.

For this to work, the tables must have a column defined as the primary key. It would also be wise to ensure that each client only updates its own rows between synchronization cycles (in other words, the rows that it has added, or for which it is “responsible”). Each update will over-write any previous values, so the values in a row where changes have been made on more than one machine will be determined by which machine carries out the synchronization process last. If this occurs, you must also repeat the whole synchronization process so that this “final” value ends up in all the synchronized databases.

Handling deleted rows is even more difficult. The only way that a DataSet can remove rows from a database table is if the rows both exist in that DataSet and are marked as being deleted. But the Fill method cannot detect when rows have been deleted from a database table. This means that you would have send the current rows from the client to the target machine (as discussed for updates), and then execute some custom code that searches for each row in the target database. If it’s not found, that row would be deleted from the DataSet (which sets the appropriate current, original and RowState values). Then, when the DataSet is returned the client and the Update method is called, these rows will be deleted from the local database.

In this article, you’ve seen a practical application of using Web services to synchronize data across multiple databases over the Internet. While the demonstration application is quite specific, you should recognize that the overall concept and process has much wider applications. For example, you could use it to synchronize almost any kind of data between different data sources, as long as the data can be extracted and exposed as a DataSet by the source Web service, or use the techniques shown here to aggregate data from many different machines and databases, regardless of how widely they’re distributed.

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.