ne of the inherent difficulties in backup and contingency planning is that it usually takes only one thing to go wrong to bring down your entire system. To be truly successful, your backup plan must account for all possibilities. Unfortunately it’s easy to forget to account for one particular scenario and foil your plan.
In this 10-Minute Solution, I point out some not-so-apparent scenarios and issues that are that I have found to be critical to a successful plan.
How do you create a truly effective backup solution? You have to plan for anything that could go wrong in your system.
Consider how these important issues affect your backup plan:
- Realize that the restore is almost more important than the backup
- Create a hot backup
- Move your backup offline
- Perfect the connection between your production and backup servers
- Consider a warm backup
- Always reassess the situation
Recently, a company I know had a problem with their Exchange server. The data store used by Exchange got corrupted. Having a thorough schedule of backups in place, the staff was not worried?till they discovered to their horror that the backup copies suffered from the same problem.
The bottom line is this: until you actually do a restore, you don’t know that you’ve successfully backed up your data. You might have a high confidence level, but you can’t be sure.
Create a Hot Backup
The best solution for critical systems is always to restore the backup once it’s been made onto another machine. This will also provide you with a “hot” backup that is ready to be used, should your production database ever crash. If you can’t afford this higher cost, you should at least do periodic test restores.
For SQL Server 2000, I suggest you take advantage of the fact that you can create a standby database using the STANDBY option in the RESTORE command. Under this mode the backup database can still be used for read-only queries. This can help you justify the cost of the standby server, because you’ll be using it at the same time as a reporting database.
Move Your Backup Off the Original Machine ASAP
A typical backup scenario creates a backup to a local set of disks. There are a couple of reasons for this. The backup is extremely fast. Also, disks are a much more convenient place to store a backup for later access than tape.
In some smaller systems, the backup is never moved to another machine. The rationale given is that because a RAID array is used, the chance that more than one drive will be lost is minimal. In other systems, the backup is copied to tape in the evening.
Even so, you still run the risk that both your original and backup can be lost and inaccessible at the same time between the time the backup is made and until it’s copied to tape. For example, if the OS crashes (ever had a “patch” or a new piece of hardware cause more problems than it solved?) none of the information on any of the disks will be available until you resolve the OS error. The same goes for an electrical short, lost network connectivity, or any one of a dozen scenarios you can think of if you try hard enough. (And if you don’t try hard enough, Murphy’s Law will do it for you.)Ensure a Reliable Network
If you’ve taken the time, effort, and money to create a hot backup solution, you might consider resting on your laurels. However, there is yet another hole to plug: the connection between your production and backup servers.
At one client of mine, a full backup of a SQL Server database took up over 20GB. Although the backup was successful, the restore process to the backup system would invariably fail before it completed. The two machines would lose network connectivity during the two hours it took before the restore could complete.
In another instance, even small transaction log backups couldn’t be restored successfully. Because of a bad network, very subtle corruption took place during the copy. Of course, when this happened, the backup database was left in a suspect state and needed a full backup and restore to start over. (Yes, this was the same client I previously mentioned so you can imagine the dilemma I was in.) At the time, Microsoft recommended zipping the file and unzipping it on the backup machine before restoring it. If the file was corrupt, the checksum process in the Zip program would catch the corruption.
One technique I’ve seen used to good effect involves building a separate network segment (with additional network cards in each machine) just for the connections between the production machines. Because I’m not an expert on networks, one of the first things I do when setting up backups is to find someone who is and who can ensure that I won’t have network-related problems.
For yet another network-related issue using SQL Server, see the following article from the MSDN Online Knowledge Base “Q137983: How to Troubleshoot Orphaned Connections in SQL Server.”
Consider a Warm (Not Hot) Backup
After you install hot backups and build a solid network, can you relax? Sort of, except there’s one thing you still need to worry about.
Hot backups are great because they are extremely current. The minute data is changed on the primary server, you propagate that change to your backup server.
But that very same close binding also comes with its own dangers. Consider what happens if someone accidentally deletes the wrong data. For example, say an administrator accidentally drops the accounts table in the database. (I’m not making this up; this actually happened at a client of mine.) The inadvertent deletion will immediately get propagated to the backup, too.
You can retrieve that lost information by restoring your last full backup and the transaction logs to the point before the deletion was made, but all that takes time. SQL Server 7 and SQL Server 2000 both include the STOPAT clause in the restore log command. When this clause is used, only transaction log records written before the specified time will be restored.
If every second of downtime is expensive, you should consider creating a copy of the database that is kept “warm”?i.e., fairly recent but not as recent as your hot backup. For example, you might set the hot backup within a minute of the primary server, but deliberately keep your warm backup 30 minutes behind your primary. That way, you only have to restore the last several transaction logs rather than start from scratch.
Constantly Reassess the Situation
How you can ever be sure you’ve covered all the bases? My personal answer is that you can’t. For that reason, I always periodically reassess my backup plans to see whether there is any reasonable way I can improve them.
It’s the only thing that can put me to sleep when I lie in bed wondering, What will crash next?