Types of Replication: Overview
SQL Server 2000 offers three types of replication: snapshot
, and merge
. Each type of replication uses various combinations of replication agents. A replication agent is an executable that a SQL Agent job calls. Usually you will set up replication using the GUI dialogs in SQL Server 2000's Enterprise Manager, and they will automatically create the jobs that execute the agents periodically with the appropriate parameters. You could write your own application to call the agents directly outside of SQL Agent, but then you would lose the monitoring support built into Enterprise Manager. It's much more common to use Enterprise Manager's built-in support to set up and monitor replication.
|Figure 1: Snapshot and transactional replication use the Snapshot Agent on the distributor server to extract publication data to a shared folder, and then use the Distribution Agent to apply the snapshot schema and data to the subscriber.|
Use snapshot replication to make a point-in-time distribution of a publication's articles, with all the defined objects and data, to one or more subscribers. The distributed data is a copy of those database's objects as they exist at a certain point in time, and hence the term 'snapshot.' Each time the snapshot replication process runs, a new snapshot containing the objects and data is sent to the subscriber.
Think of snapshot replication as a kind of discrete operation. You can send the objects and their data once or on a periodic schedule, but each time the subscriber is fully reinitialized. You don't need to continuously monitor snapshot replication because it does not send data continuously.
Snapshot replication exports the schema and data of a publication to a shared file location, as you can see in Figure 1. The files consist of scripts to create the database objects defined in the publication: the table, view, stored procedure, and function schemas, along with supporting objects such as index and trigger schemas. Export files containing the data are also placed in the subdirectory. As you'll learn later, SQL Server's replication agents use these files to create the objects on the subscriber and load the data. Each time SQL Server generates a snapshot, a new file location is set up and then it recreates the entire publication schema and data files.
|Figure 2: After the snapshot has been applied using the Snapshot and Distribution Agent, a push transactional subscription uses both the Log Reader and Distribution Agents.|
Any changes you make to the publication's articles are sent anew each time SQL Server generates the snapshot that contains the schema scripts and data files. You don't need to worry about changes in stored procedures, triggers, or functions. Because the snapshot generates everything from scratch and recreates the objects on the subscriber, if you've included them in the publication, changes to them will show up in the subscriber.
You'll use snapshot replication when you have relatively static data or you have small amounts of data. When you have static data, such as lookup tables, you may seldom need to refresh the data on subscribers. Because the snapshot process exports all the data for tables, snapshots of large amounts of data can take a long time. Also, if your subscribers can tolerate latency, that is, the data being only occasionally refreshed, snapshot replication may have a role.
Transactional replication uses snapshot replication in order to initialize the data objects on their subscribers. Normally you only need to generate the snapshot once, for initialization. From that point on, transactional replication sends transactions to the subscribers through the distribution process. Transactional replication uses the Log Reader Agent, to read the transaction log of the publisher database continuously and stores those transactions in the distribution database, as you can see in Figure 2. Transactional replication will send only the changes to the subscribers.
You'll find transactional replication very useful when you need to deliver data changes to other servers as quickly as possible and the subscribers are continuously connected to the distributor. SQL Server will deliver the transactions to the subscribers atomicallyeither all the data of a transaction will be applied, or none of it.
|Note how similar snapshot and transactional replication are. Transactional replication has two additional agents but otherwise uses the same agents as snapshot replication.|
Normally, transactional replication architectures keep the subscribers in a read-only state. You can change data on the subscribers because the databases are not in a read-only mode, but the normal transactional replication flow is one direction only (from publisher to subscribers).
If you must occasionally make changes on the subscriber that propagate back to the publisher, transactional replication provides two mechanisms: immediate updating subscribers and queued updating subscribers. These mechanisms are really meant for occasional updates to the subscriber. If you need to make frequent updates to the subscriber or subscribers, you should consider merge replication.
Transactional replication creates special system stored procedures on the subscriber to apply changes to the data. The distribution database queues up transactions so each subscriber receives changes in exactly the same order they occurred on the publisher. Then the Distribution Agent connects to the subscriber and executes the system stored procedures in order to apply the transactions to the subscriber's data. Transactional replication is the fastest form of replicationit can deliver publication changes to a subscriber very quickly.
You can also change the schema being published, within limits. You can add or drop a table's column using the appropriate replication dialogs in Enterprise Manager or through special replication system stored procedures. (See "Schema Changes on Publication Databases" in SQL Server 2000 Books Online for further details.)
Merge replication also uses a snapshot to initialize the subscriber data. After initialization, data is exchanged between publisher and subscriber using a peer mechanism internal to the databases, and not transactions. Merge replication does not use a distribution database to distribute data. Instead, it uses the Merge Agent, as you can see in Figure 3.
|Figure 3: After applying the snapshot with the Snapshot and Merge Agents, a push merge subscription uses the Merge Agent on the publisher to synchronize data between the publisher and subscriber.|
You'll use merge replication when you have frequent updates on both publishers and subscribers. Also, merge replication is very tolerant of occasionally connected servers. A subscriber can be disconnected from a publisher for a period of time, and then synchronization will occur when the subscriber next connects to the publisher.
Merge replication makes very little use of the distribution database, so you don't need to have a separate distribution server. With merge replication, SQL Server tracks changes to data within each database, and SQL Server synchronizes the publications using the Merge Agent. In a push subscription, the Merge Agent job for each subscription will reside on the publisher/distributor. In a pull subscription, a Merge Agent job will be placed on the SQL Agent of each subscriber.
Because conflicts can occur in a peer-type relationship, merge replication supplies a conflict resolver mechanism. You can set certain priorities within the resolver and even write your own custom resolver.
In both transactional and merge replication, if the subscriber becomes significantly out of date, that is, it expires, SQL server will renew the subscription by generating a new snapshot for the subscriber. You can differentiate all three types of replication by the agents they use. Now let's take a look at replication agents.