eplication figures as one of the more prominent features of SQL Server 2000. Replication is a complex application that uses a combination of stored procedures and executables to distribute and copy data between SQL Server databases. If you take care not to get lost in the details and confused by occasionally misleading terms, replication can be an important component in the database architecture of a SQL Server application. In this article, you’ll learn about the terminology surrounding replication, the types of replication available in SQL Server, and how to configure (that is, install) replication.
Replication can provide you with some interesting solutions for your data tier architecture because it allows you to use multiple database servers on the back end. You can publish to multiple servers, gather data from multiple servers, provide loosely connected clients with the ability to share data with a centralized database, and scale out database work in a horizontal fashion.
Let’s take a look at replication terminology, then the types of replication, and finally with the steps required to configure replication.
Getting clear on terminology is one of the most important hurdles to overcome when diving into SQL Server replication, so in this section you’ll learn about the basic terms and concepts.
The purpose of SQL Server replication is to copy or distribute a database’s data to one or more data sources, usually other SQL Server databases. The destination database(s) could be on the same SQL Server, but most commonly you’ll want to get the data from one SQL Server to another SQL Server.
The Publish/Subscribe Metaphor
SQL Server replication adapts a publish/subscribe metaphor to name various replication roles and objects. You define certain data sources as publications that become available for replication. When you have database servers subscribe to the publication, replication begins.
The publish/subscribe metaphor is pretty good, but don’t take it too literally. Remember, you’re publishing data from a database, not publishing a magazine. The publish/subscribe metaphor works well when you use it to help understand replication’s components and roles, but do not expect all of the elements of a magazine or newspaper publishing business to apply.
Publisher, Publications, and Articles
The SQL Server that you define as having the origin of the data you want to replicate is called the publisher, or publishing server. A publisher is always one SQL Server, and you can only define a given SQL Server as a replication publisher once. It is logical to think of a publisher server first, but as you’ll learn below in the section on configuring publishing, you actually have to define the distributor before you define the publisher.
You can replicate SQL Server 2000 database tables, views, and stored procedures. When you replicate a table or view, you are sending data; when you replicate stored procedures, there are some restrictions on what you can send: snapshot replication will send the text of a stored procedure, but transactional replication will only send the execution of stored procedures.
For each database on the publisher SQL Server, you can define selected sets of tables and views (and sometimes stored procedures) containing the data you want to replicate. Each defined set of tables and views is called a publication. A publication must exist in one database only; it cannot span databases.
Within a publication, each table or view is called an article. An article can be an entire table or view, or just a subset. You can publish a subset of a table or view’s columns, and you can also restrict the rows of the table or view by using a WHERE condition when you define the article. There are other specialized methods of filtering as well, which you’ll learn about in the sections on Transactional and Merge replication.
When you define a publication, you choose which type of replication to use; every publication must be a snapshot, transactional, or merge publication. Once you’ve defined the type of the publication, your options for creating articles are adapted to the type of replication you’ve chosen. You’ll learn more about these types in the next section.
Before you can set up a replication publisher, you must define a distributor. The distribution server contains at least one distribution database in which SQL Server places system tables and procedures unique to replication.
The distribution server plays a major role in snapshot and transactional replication as a store-and-forward location for replicated data. However, it only plays a minor role in merge replication. In transactional replication, where the distribution server plays a store-and-forward role for data, it’s more common and often more efficient to locate the distributor on its own SQL Server. On the other hand, with merge replication it’s very common to make the publisher server do double-duty as the distributor server.
Subscriber and Subscriptions
Once you have defined publications on a publisher server, other data sources known as subscribers can then receive or interact with the replicated data. You can create either a push or a pull subscription. You define a push subscription at the publisher, and you define a pull subscription on the subscribing server.
The limitations and behaviors of subscribers and subscriptions can change depending on the type of replication you’re using. You’ll learn more about the specifics of each subscriber behavior in the next section called Types of Replication.
Types of Replication
Before you get into the details of setting up replication, you should have a basic understanding of the types of replication. There are only three: snapshot, transactional, and merge replication. Three is a nice simple number, but the variations and options on transactional and merge can make the number of possible options much larger.
Because snapshot replication delivers a one-time-only copy of selected data, and both transactional and merge replication use snapshot replication to initialize, you’ll hear most often about just transactional replication and merge replication.
Snapshot is the simplest form of replication. It makes a simple copy of a data set and delivers it to a subscriber. The direction of the data is one-way only, with no updates allowed from the snapshot’s subscribers back to the publisher. You can use a snapshot to send data once, or you can repeat a snapshot on a regular basis.
All three types of replication use snapshot replication. You can use snapshot replication by itself, but SQL Server also uses it as the method to initialize both transactional and merge replication. Behind the scenes, snapshot replication uses a Snapshot Agent (snapshot.exe) in combination with the Distribution Agent (distrib.exe) to export all relevant information about a publication’s data to a text file. The Distribution Agent then delivers that data to the subscriber.
Transactional replication is usually also a one-way-only incremental form to update subscribers. It gets its name because it uses an agent on the distributor that reads the transaction log of the publisher database (logread.exe.) Transactional replication is normally one-way only, and you treat the subscribers as read-only.
Transactional replication is very sensitive to connectivity issues, and normally assumes continuous connections between publisher and subscribers. If a subscriber is disconnected, data will build up in the distribution database, to apply to the subscriber once it connects again. If the subscriber gets too far out of date, a snapshot will have to be run again.
When you use transactional replication to distribute data to read-only subscribers, it can be very fast, delivering data changes from publisher to subscriber in a matter of seconds. This can make it an ideal mechanism for reporting servers. If you have updating subscribers, however, the transaction delivery time can increase significantly.
Merge replication was introduced as a more efficient method of allowing updates across publishers and subscribers. It uses a Merge Agent (merge.exe) on each server to communicate between publisher and each subscriber. Merge replication requires a ROWGUID column on each replicated table that it will use for tracking changes.
In merge replication, SQL Server uses triggers on each replicated table to populate the MSmerge_contents system table in each database. It does not read the transaction log of the publisher and does not use the distribution server for store-and-forward.
Merge replication has the remarkable feature that subscribers need only be occasionally connected to the publisher, and when the subscribers do connect, they can automatically catch up.
The extra metadata stored on each node in a merge replication application can cause a merge to sometimes be somewhat slow under heavy transaction loads.
Choosing the Type of Replication
Before you dive into configuring or installing replication, you should decide which type of replication you need. You need to ask yourself several questions before you choose a type of replication.
First of all, what type of update scenario do you need between the publisher and the subscriber data? Do you need updates on the subscriber to replicate back to the publisher? Will the subscribers be read only?
If you want to have updates made on a subscriber replicated back to the publisher, you must use merge or transactional replication with queued or immediate updating subscribers. You cannot use snapshot replication or one-way only transactional replication.
You also need to consider how transactionally consistent the data must be between the publisher and the subscriber. If you allow updates among subscribers, will having data converge to the same values across publisher and subscribers be sufficient, or do you need the subscribers to be more transactional?
Snapshot replication is, by definition, out of date right after the application of the snapshot, until the next snapshot occurs. Transactional replication provides the maximum amount of consistency between publisher and subscribers. Merge replication uses a strategy of convergence, and the resulting convergent data may differ depending on the order that changes are applied.
Once you’ve decided on the type of replication you desire, you can move on to configuring replication.
Setting up SQL Server replication is easy because Enterprise Manager gives you a number of wizards to use. However, just the act of initially configuring replication requires that you make some decisions.
Your servers must meet several requirements before you can configure replication. You should ensure that all the SQL Servers involved are using service accounts for both the SQL Server and SQL Agent logins, and that the servers can connect to each other sufficiently so that the type of replication you install will work.
You should use either SQL Server Enterprise or Standard Edition for the publishing and distribution servers. (For development, you can use the Developer Edition. There does not appear to be any difference between all three of these editions for replication purposes.)
Table 1 shows a consolidated version of the SQL Server Books Online matrix that shows you which edition of SQL Server you can use for the various types of replication.
Table 1: The types of replication and the SQL Server editions that will work with them.
Type of replication
Enterprise, Standard,Developer, and Evaluation
You can also replicate to heterogonous subscribers (non-SQL Server databases) for snapshot and transactional replication without updating subscribers. You can use SQL Server 2000 to subscribe to data being published from other databases such as Oracle, DB2, Microsoft Access, and other databases that satisfy certain ODBC or OLE DB subscriber requirements, but you must set it up programmatically. (For more information, see “Programming Replication from Heterogeneous Data Sources” in SQL Server Books Online.) When you choose to use non-SQL Server publishers, you can only use transactional and snapshot replication.
The Configure Publishing and Distribution Wizard
To get started configuring replication, you run the Configure Publishing, Subscribers, and Distribution wizard from Enterprise Manager. Despite its name, this wizard asks you to initially choose a location for the distribution server, and then asks you to enable one server as a publisher and one or more servers as subscribers. So before you start, you need to decide whether you are going to separate the distribution server from the publishing server.
For merge replication, you would normally place the distribution server on the publishing server because the activities of the distribution server are fairly light. But for snapshot and transactional replication, if the publisher production server is under significant load, it’s a better practice to identify a separate server as a distribution server.
You can find the Configure Publishing, Subscribers, and Distribution option under Replication in the Enterprise Manager menu. Make sure you’ve registered each server you want involved in replication, and then select the server that you want to have as the distribution server.
The wizard initially asks whether you want to make the selected server its own distributor, as you see illustrated in Figure 1. If this is the first time you’re running the dialog, choose this option, which is the default.
|Distributor-Publisher Login Security
When you choose another server other than the distribution server as a publisher, SQL Server pops up a one-time-only dialog box (see Figure 4) that tells you what you need to proceed. For each server other than the distribution server that you enabled as a publisher, you must: