A Replication Architecture for Enterprise-Grade Subversion

ubversion is by far the most popular open source version-control system, and for good reason. Its many powerful features, such as atomic commits, fast branching and tagging, efficient treatment of binary files, and HTTP and WebDAV access, make it an excellent choice for many organizations. Some of the more advanced features of Subversion 1.5?in particular merge tracking?make the system even more compelling.

A good version-control system is indeed strategically important for any software-development organization. As all developers know, your source code is the very lifeblood of an IT organization. So keeping your source code safe makes good business sense, and one of the main advantages of a source-code repository is knowing that your code is always safely tucked away in a safe place.

Or is it?

Servers can crash, networks can go down, and fires can destroy your data center?are you sure your source-code repository can be restored to a reasonable state quickly if you should face one of these emergencies? Are simple file system backups enough?

This article presents some strategies for making sure your Subversion repository is safely backed up, so that you can quickly restore it to its most recent state. It also demonstrates an easy way to set up a simple, yet powerful high-availability solution in the form of write-through proxying.

Backing Up Your Repository the Hard Way
A primary reason to back up your Subversion repository is to ensure that you can recover your precious source code in the event of a disaster. And one of the most basic backup strategies is simply to ensure that your data is regularly copied to a safe place. Subversion makes this easy.

By default, Subversion stores your repository in a flat file format known as FSFS. This format is portable and easy to duplicate and move around. By using the FSFS format for your repositories, you can restore any lost repository simply by copying the repository files back into the appropriate directory. However, if you simply copy your repository on a regular basis (say, every hour), you are likely to miss commits. Any changes that users commit to the repository between the last backup and a server crash is lost.

You can get around this problem by making sure that Subversion automatically backs up its data whenever a user commits. You can use the Subversion Hooks to do this. Hooks are a powerful scripting feature that every advanced Subversion user should know. The hooks directory of your Subversion repository should look something like this:

Kapiti:repos johnsmart$ ls hooks/post-commit.tmpl          pre-lock.tmplpost-lock.tmpl               pre-revprop-change.tmplpost-revprop-change.tmpl     pre-unlock.tmplpost-unlock.tmpl          start-commit.tmplpre-commit.tmpl

These are essentially scripts that are executed at key points during the Subversion commit lifecycle. For example, the post-commit script will be executed just after each svn commit operation.

To get Subversion to back up your repository automatically, you could simply place a command in this script to back up your entire Subversion repository. This way, every time someone commits a change to your repository, the whole repository will be safely duplicated and placed out of harm’s way.

The big problem with this approach, however, is that it is not particularly scalable. As your repository grows, so will the space needed to store all of the versions and the time required to back up the repository files. And restoring the repository after a crash is very much a manual task. Luckily, there’s a better solution: a tool called svnsync.

Synchronize Your Repository with Style Using Svnsync
A much more efficient way of backing up your Subversion repository is to use svnsync. This useful tool, which dates from Subversion 1.4, works by relaying revisions (as a series of commit operations) to a replicated server. Essentially, the replicated server “replays” each commit on its own copy of the repository. This approach is much more efficient than backing up the entire repository each time; only the data required for the commits performed since the last synchronization are transmitted. In addition, you can use svnsync over the network, using either of the standard Subversion protocols, svn:// or http://.

Setting up svnsync is easy enough. To start, all you need are two running Subversion repositories: one with the data you want to mirror, and the other empty. You can create an empty target repository on the mirror server as you would any other Subversion repository, using the ‘svnadmin‘ command as shown here:

$ svnadmin create /var/svn/repos-mirror

It is important to check that the target repository does not already contain any revision data. For all intents and purposes, the target repository should be treated as a read-only repository that only synsync will use. Indeed, if you try to update this backup repository independently, Subversion likely will become very confused.

Before you can start backing up to this replicated server, you do need to make one minor configuration change to the mirror repository. During the synchronization process, svnsync needs to update revision properties on the mirrored server. Modifying revision properties is not activated by default in a new Subversion repository; you need to activate it by writing a pre-revprop-change hook script. This is easier to do than it sounds. In the hooks directory of your target Subversion repository, you will find a set of sample hook script templates, which you can examine to get an idea of how these scripts work.

At this stage, you need to allow revision properties to be updated. To do this, rename the file to pre-revprop-change.tmpl, pre-revprop-change on Unix, or pre-revprop-change.bat on Windows, and make it executable (if required). Then, modify the script so that it always returns 0 as follows:

#!/bin/shexit 0

This authorizes all updates to revision properties without distinction.

Next, set up your main repository so that it synchronizes with your mirror repository. To do this, use the following command:

$ svnsync init svn://svnmirror svn://svnrepos

This prepares the svnrepos repository to be synchronized with the one on svnmirror. To actually perform the synchronization, you need to run svnsync sync as shown here:

$ svnsync sync svn://svnmirrorCommitted revision 1.Copied properties for revision 1.Committed revision 2.Copied properties for revision 2.Committed revision 3.Copied properties for revision 3.Committed revision 4.Copied properties for revision 4.Committed revision 5.Copied properties for revision 5.Committed revision 6.Copied properties for revision 6.Committed revision 7.Copied properties for revision 7.Committed revision 8.Copied properties for revision 8.Committed revision 9.Copied properties for revision 9.Committed revision 10.Copied properties for revision 10....

After this is done, your mirror repository contains a carbon copy of your main one.

Securing Your Replicated Server
As previously mentioned, your replicated server is a fragile beast. So you don’t want just any old user committing changes to this repository. A good way to ensure this is to enforce strict user-access rights. To do this, create a special user with exclusive access to this repository. Call it something like syncuser (as the examples to follow do).

In this case, you need to make your pre-revprop-change script a bit more sophisticated. The following script will allow only syncuser to make updates to revision properties:

#!/bin/sh USER="$3"if [ "$USER" = "syncuser" ]; then exit 0; fiecho "Only the syncuser user may change revision properties" >&2exit 1

You also need a similar script for the start-commit hook, to ensure that it can commit changes only to this repository.

Finally, when you initialize your replication process using this strategy, you need to provide the username and password for syncuser as follows:

$ svnsync init svn://svnmirror svn://svnrepos ?sync-username syncuser –sync-password=secret

Automating the Replication
After the synchronization is set up, it is easy to automate the whole process. Just add post-commit and post-revprop-change hooks to kick off a synchronization process in the background whenever any changes are committed. The post-commit script might look like this:

#!/bin/sh# Post-commit script to replicate newly committed revision to mirrorssvnsync sync svn://svnmirror > /dev/null 2>&1

And the post-revprop-change hook could look like this:

#!/bin/sh# Post-revprop-change script to replicate revprop-changes to slavesREV=${2}svnsync copy-revprops svn://svnmirror ${REV} > /dev/null

Now, your mirrored repository will always synchronize with your main repository. And, if the network connection between the servers should fail, you’re covered because the mirror will update the next time the connection is available.

Using Write-Through Proxying for Load Balancing
The discussion so far has concentrated on how to back up your Subversion repository to a remote mirror. This is a useful technique, but with Subversion 1.5 and Apache you can go much further. Subversion 1.5 introduces the idea of write-through proxying, which allows you to set up a distributed repository architecture that is well suited to geographically distributed teams.

Write-through proxying is based on the observation that the vast majority of operations on a Subversion repository are read-only. A write-through proxy architecture is composed of one central master repository and many read-only replicated repositories (see Figure 1). The replicated repositories are installed near local development teams. The replicated repositories are kept in sync with the central repository using an automated process based on svnsync.

Figure 1. Using Write-Through Proxying in Subversion: A write-through proxy architecture is composed of one central master repository and many read-only replicated repositories.

Whenever a developer performs a read-only operation, such as update, the request is processed directly by the local read-only replicated repository. Read/write operations (such as commits) are transparently sent to the central repository. When the update is done there, all of the distributed mirrors are also updated.

Write-through proxying works exclusively with Apache, and requires a bit of setting up. The rest of this section explains what’s involved.

First of all, you need to set up your read-only replicated servers. The underlying replication mechanism of write-through proxying still relies on svnsync, so you need to prepare these servers using svnsync, as shown previously.

Next, you configure the replicated servers to run on Apache. The key part here is the Location element, which contains the new SVNMasterURI entry:

   DAV svn   SVNPath /var/svn/svnmirror   SVNMasterURI http://svnrepos 

This tells Subversion to relay any update requests to the master server on http://svnrepos.

The mirror repository is in principle read-only. However, the master server needs to update it whenever it synchronizes the mirror repositories. To allow this, set up a special address on each slave server that only the master server can update:

     DAV svn      SVNPath /var/svn/svnmirror     Order deny,allow     Deny from all     # Only let the server's IP address access this Location:     Allow from 192.168.1.101

Finally, you need to set up post-commit and post-revprop-change hooks to synchronize the mirrored repositories after any updates. Note how the svnsync process is run in the background, so that it doesn’t slow down the commits (this neat trick comes from the Subversion documentation):

#!/bin/sh# Post-commit script to replicate newly committed revision to mirrorssvnsync sync http://slave1/svn-proxy-sync > /dev/null 2>&1#!/bin/sh# Post-commit script to replicate newly committed revision to mirrorssvnsync copy-revprops http://slave1/svn-proxy-sync > /dev/null 2>&1

After this is done, you should be good to go!

A Subversion-Based, Replicated Repository Architecture
Subversion replication is a powerful tool. Not only does it help you to backup your repositories reliably and efficiently, but it also can provide the foundation for a powerful and easy-to-configure replicated repository architecture. If you are setting up an enterprise-scale Subversion repository architecture, be sure to check it out!

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Related Posts