The Physical Log
To restore the unmodified document in this scenario you have to copy the data that is already there to another file before you overwrite any data in the main file. Then, if there is a failure before any given checkpoint can be completed, the old data can be retrieved from this other file.
The other file is called the physical log, and its use is shown in Figure 7.
The physical log acts as a copy of your original data. This is conceptually similar to keeping the original file around while writing the new data to a temporary fileat a certain point, you have both the old and new data on disk. But the physical log is differentand betterbecause instead of having a copy of the entire file, you have a copy of only the data that has changed, saving a lot of space.
Each of the put() methods in CKPTFile calls a method called markDirty(). This method tells the CKPTFile that a particular region of the file is about to be changed (i.e. made dirty). The CKPTFile first makes a copy of this region of data to the physical log and then carries out the write.
It's important to write to the physical log before writing to the datafile. If you write to the datafile first and the system crashes before writing to the physical log, you'll have no copy of the original data.
The physical log is just another file, wrapped in a class called PhysicalLog. It is opened by CKPTFile the first time a put() method is called. It contains a series of entries, implemented by a class called PhysicalLogEntry (see Listing 2). Each entry contains a chunk of data, along with the start and end positions within the data file.
Checkpointing
To checkpoint a data file, you have to flush all written data to the file. Remember that most operating systems cache filesystem data in memory, making reads and writes faster. When you write data to a file, the data isn't necessarily written to disk immediately. Rather, it stays in the buffer for a short time, while other writes happen. When either enough writes have accumulated or enough time has passed, the operating system writes the modified data to disk.
This greatly increases filesystem speed, but it also makes the system less reliableyou can write data to a file and think it is safely there when it isn't. If the system crashes before the real write actually happens, the data won't be on disk.
This is where checkpointing comes in. To checkpoint a data file, simply tell the system to really force the data out to disk. This can be done in NIO using the MappedByteBuffer.force() method. This method forces any changes out to disk. When this method returns, you can safely assume that the data really is on disk.
You can checkpoint a CKPTFile at any time by calling its checkpoint() method. However, note that this method is synchronized, as are all the get() and put() methods. This assures that checkpointing won't happen at the same time that a write is in progress. This is good; we only want checkpoints to happen in the safe zone between writes.
Once checkpointing completes, you can delete your physical log. You won't need it now.