Services that were designed for central access, as in the inventory example, will not likely contain any provisions for directly partitioning and filtering data. In the centrally accessed service model, the exposed services are more likely to provide generic data access. Any custom data partitioning and filtering (apart from security and permission filtering) is done by the application. In the inventory example a given user's application may need to show only a few hundred inventory items that match that user's job role. The online application may request the entire inventory and then filter it based on the user's role. Forcing the application to do this filtering in the offline case wastes both bandwidth and processing power.
For the application maintainability criterion, in addition to the fact that there are many installed versions of the offline application, the application is also very tightly coupled to all the services it accesses. If any of the service interfaces or addresses change, all instances of the application will break. Possibly making this situation worse, it may not be obvious to the application that one of the interfaces has changed. The application may get part way through its synchronization and then fail, and be unable to complete until a new version of the application is installed.
All the issues highlighted so far seem to suggest the need for some level of consolidation in the synchronization process. To accomplish consolidation, a staging database is inserted between the data sources and the offline application. While this new staged synchronization architecture may seem to add an extra level of complexity, it instead solves most of these problems (see Figure 3).
In this architecture, the staging database acts as a buffer between the data sources and the offline application. The staging database's only job is to support the offline application's synchronization by consolidating the offline data in a single place—a structure designed for synchronization.
Figure 3. Staged Synchronization Architecture: A staging database between the data sources and the offline application may appear to add an extra level of complexity, but it instead solves most synchronization problems.
The staged synchronization architecture still requires that something be exposed to the Internet. However, in this case it doesn't matter how many heterogeneous data sources the data is spread across; only a single point needs to be accessed, and because only a single point needs access, it is much simpler to provide secure encryption and authentication without requiring a VPN connection.
A properly designed synchronization should involve only a single, bi-directional exchange of data over the public, unreliable network. At synchronization time, the offline application submits all the changes it made offline to the staging database. In return, the staging database responds with all the changes made to the data it stores locally. While the synchronization process should still be designed to handle all possible errors, reducing the number of interactions down to two is guaranteed to make the whole thing more robust.
Note that simply getting the changes into the staging database does not mean that synchronization is complete. The staging database will still have to evaluate the changes it received and begin the processes of integrating those changes into back-end data stores. At this point you may be thinking that it seems as if this model has simply offloaded the real synchronization problems to the staging database. While that observation is true, there are benefits to having the data integration work done at the server. The process of integrating the changes back will always be complex and very unique for your application. However, after robustly passing the changes to the staging database, the staging database can patiently integrate the changes using the same methods used by the online application. This integration allows the offline application to be as robust as its online counterpart.
After the application passes the data to the staging database, the synchronization (from the application's perspective) is complete, even though the integration into the back-end systems hasn't happened. Therefore, the application cannot participate in the data integration. Depending on your application, this approach can be either a benefit or a drawback. It is a benefit because the application's synchronization to the staging database can be fast, small, and usually successful. However, the data may have to be modified (because of business rules, conflict resolution, and so on) by the staging database to integrate it back into the data stores. Because the application already has completed its synchronization, it cannot get those changes until its next synchronization.
A staging database can substantially reduce the amount of data sent during synchronization. Returning to the inventory example, the only way to tell what has changed is to request the entire inventory and compare. In the direct remote synchronization architecture, change tracking was carried out by each individual application. In this architecture, the staging database can do all that work on behalf of the offline applications. For example, the staging database could query the inventory list every 15 minutes and update its own inventory table. When any offline application synchronizes, the staging database can use its own change tracking to only deliver those items that have in fact changed.
However, this approach introduces the possibility of synchronizing stale data to the offline application. In the direct remote synchronization architecture, the inventory services were called directly during synchronization. In this approach, the data synchronized to the application will only be as recent as the last time the staging database updated itself.