he previous article in this series on trees showed how to build a sorted binary tree. Inserting and locating items in a sorted tree is relatively quick and easy, so it’s natural to wonder if you could use a sorted tree as an index for a database. To find a record, you could look up its key in the tree and then the node containing that key would contain a reference to the complete record’s location. This method is simple and effective, but a normal sorted tree has some drawbacks that prevent it from behaving as you would expect.
This article discusses the reasons why ordinary sorted trees don’t work well as indexes and explains B-trees (pronounced “bee trees”), the data structures that are actually used by databases to build index structures. It also offers a B-tree program (in both Visual Basic and C#) for download, but none of the code is included the article body because it’s fairly involved.
But first, you need to learn a bit about tree sizes and behavior so you can understand the performance issues.
To insert an item in a sorted binary tree, you compare the item’s value with the value stored in the root node. If the new value is smaller than the root’s value, you move to the root’s left child. If the new value is greater than the root’s value, you move to the root’s right child. You continue moving down to the nodes’ appropriate left or right children until you reach a leaf node. Then you insert the new item as that node’s left or right child as appropriate.
Locating an item in the tree is similar. You start at the root node and move through the tree, following each node’s left or right branches depending on the target value, until you either find the target value or you reach a leaf node and can conclude that the value isn’t in the tree.
When either inserting or locating an item, you start at the root node and move down through the tree. You might need to go all the way to a leaf, which gives an upper bound on how far you might need to search. The question then is how tall might the tree be?
You can prove by induction that a full binary tree of height H contains 2^H ? 1 nodes. I haven’t had an excuse to use induction for a while, so I’m going to jump at the chance! For the base case, note that a tree of height H = 1 is just a single node. That tree contains 2^1 ? 1 = 2 ? 1 = 1 nodes, as desired.
For the inductive step, assume I’ve shown that a full binary tree of height N contains 2^N ? 1 nodes. To finish the proof, I must show that a full binary tree of height N + 1 contains 2^(N + 1) ? 1 nodes. I can build a full binary tree of height N + 1 by building two trees of height N and connecting each of their root nodes to a new root node. Figure 1 shows a tree of height 3 built from two trees of height 2 (surrounded by dashed lines).
|Figure 1. Growing Trees: You can build a tree of height N + 1 by joining two trees of height N with a new root node.|
The tree of height N + 1 contains all of the nodes in the trees of height N plus one new root node so the total number of nodes is (2^N ? 1) + (2^N ? 1) + 1 = 2 * 2^N ? 2 + 1 = 2^(N + 1) ? 1, which is what I need to prove.
I can use a similar argument to show that a full K-ary tree of height N contains (K^N ? 1)/(K ? 1) nodes. For example, a full tree of degree 10 and height 4 contains (10^4 ? 1)/(10 ? 1) = 1111 nodes.
If you ignore constants, a full K-ary tree of height H has on the order of K^H nodes. Conversely, a full K-ary tree containing N nodes has height on the order of LogK(N). This is good news for those wanting to use trees for indexes. It means that the height of the tree, and therefore the maximum length of the search for an item in the tree, grows very slowly as the number of items in the tree grows.
For example, suppose you have a full binary tree containing one million nodes with a height roughly log2(1 million) or around 20. You would need to compare the target value to around only 20 values before you find it or conclude that the value isn’t in the tree. That’s not much work for searching a database containing a million values!
If that were the end of the story, you could easily use a binary tree to store and retrieve values. Unfortunately, this entire performance discussion depends on the tree being relatively full. If the tree is full, it is fairly wide and, more importantly, short. However, if you add items to a sorted tree in different orders, you get trees with different widths and heights. Some orders produce a tree that is very tall and skinny. For example, consider the sorted binary trees shown in Figure 2.
|Figure 2. Laurel and Hardy Trees: Depending on the order in which you add items, the tree may end up short and fat or tall and thin.|
The tree on the left was built by adding items to the tree in the order 4-2-3-1-6-5-7. The tree on the right was built by adding the same items to the tree in the order 7-1-6-5-2-4-3. Searching for an item in the left tree will take at most three steps, but searching for an item in the right tree might take as many as seven.
Searching these little trees is no big deal, but suppose you had a big customer database containing a few hundred thousand?or even a million?records. In that case, it could take a long time to search a tall, skinny tree.
The problem with tall, thin trees is magnified by the fact that you won’t be able to store such a large database all in memory at the same time. Disk accesses typically take 100 times as long as memory accesses, so hammering the disk 100,000 times to find values to examine could take a very long time indeed.
In practice, if you build a sorted tree and the values are more or less randomly arranged, you’ll get a tree that isn’t as short as possible but that also isn’t too tall and thin. However, a few common scenarios can really wreak havoc with your tree. For example, suppose you pull data out of an existing database and add it to your tree. You may end up adding the values to the tree in sorted order, which creates a tall, thin tree.
While you can’t be certain that a simple sorted binary tree will stay reasonably short, several kinds of tree structures automatically rebalance themselves to guarantee that they don’t become too tall and thin. Red-black trees, AVL trees, 2-3 trees, B-trees, and B+trees are all types of balanced tree. They all guarantee that the tree has a certain maximum height so they provide reasonable performance no matter what order you use when adding items to them. Unfortunately, adding and removing items from these trees is much more complicated than in a simple binary tree.
This article focuses on one of the kinds of self-balancing trees that is often used to provide indexing for databases: the B-tree. For some number K, a B-tree stores items in nodes that contain between K and 2 * K key values, and between K + 1 and 2 * K + 1 children. (The root node is a little different, so it is exempt from these rules.) Because the nodes hold a group of values instead of a single value like a node in a binary tree does, they are sometimes called buckets. A final condition is that all of the tree’s leaves are on the same level.
For example, Figure 3 shows a B-tree where K = 2, so each internal node contains between two and four key values (in this example, letters) and between three and five children.
|Figure 3. Basic B-tree: This B-tree has nodes of degree 2 through 4.|
In this example, the root node has three children but sometimes it could have as few as one child. Note that the nodes contain some empty spaces for values that could be added to those nodes later.
Because each node in a B-tree has at least K children, the height of a tree containing N items is at most logK(N). Therefore, the tree cannot grow too tall and thin. If you add N items to the tree with K = 2 as shown in Figure 3, the tree’s height will be no more than log2(N). The tree in Figure 3 holds nine nodes so its height must be no more than log2(9), which is about 3.17. The tree has height 2, which is less than 3.17, so it checks out.
Of course, the height guarantee doesn’t come for free. Ensuring that each node has between K and 2 * K + 1 children takes some doing. The following sections explain how the tree rebalances itself when you add or remove items.
Finding an item in a B-tree is easy. Start by comparing the target value to each of the values in the root node from left to right and stop when you reach a value greater than or equal to the target value. If the node’s value equals the target value, you’ve found the item.
If the node’s value is greater than the target value, move down the link just to the left. If the target value is greater than every value in the node, move down the node’s rightmost branch. Continue checking node values and moving down branches until you find the target value or reach a leaf node. If you reach a leaf and still haven’t found the target value, then the target isn’t in the tree.
For example, suppose you need to find the value K in Figure 3. You first compare K with the values in the root node G and R. K comes after G and before R so you move down the link that is drawn between G and R to the child node containing the values J, K, and P. In the child node, you find the target value K and you’re done.
Now suppose you want to find the value U in Figure 3. You first compare U with the values in the root node G and R. U comes after both G and R, so you move down the node’s rightmost link to the child node containing the values T and Y. In the child node, U comes after T but before Y, so you would move down the child link between those two values. However, this is a leaf node, so there is no such child link and you can conclude that the value U is not in the tree.
Finding items in a B-tree is relatively easy. Adding and removing items is much harder because you may need to rearrange the tree to ensure that it stays balanced.
To add an item to the tree, you follow the same search path that you would take to find the item. You start at the root and move down through the tree until you find the value or you reach a leaf node. (For simplicity, assume the key values must be unique, so you won’t find the value in the tree.) When you reach a leaf node, you add the value in the appropriate spot.
|Figure 4. Budding Trees: New values are added to leaf nodes.|
For example, suppose you want to add the value M in the tree shown in Figure 3. M comes between G and R, so you move down the branch between G and R to the node containing J, K, and P. This is a leaf node so you insert the M here. Note that you need to rearrange the values in the node so they stay in order. Figure 4 shows the result. The new value M is highlighted in pink. The value P, which was moved to make room for M, is highlighted in green.
Of course, the leaf node where you want to place a new value may be full already. For example, suppose you want to add the value N to the tree shown in Figure 4. That value belongs in the node where you just placed value M, but that node is full so there isn’t room. To make room, you need to perform a bucket split.
In a bucket split, you take the 2 * K items in the bucket, plus the new item, and arrange them in order. You leave the leftmost K items in the existing bucket and move the rightmost K items into a new bucket. You then pass the middle item up to the parent node. The parent node adds this middle item to its list of items and adds the new bucket as a child.
|Figure 5. Tree Rearranged: In a bucket split, a new bucket is created and several values are moved.|
In this example, the leaf node’s values, together with the new value N, are: J, K, M, N, and P. The leftmost values J and K remain in this leaf node, the rightmost values N and P go into the new node, and the middle value M moves up to the parent node. In the parent node, the existing value R must be moved right to make room for the new value M. Figure 5 shows the new tree. The new value N is highlighted in pink. The values M, P, and R, which were moved during the addition, are highlighted in green.
When you move an item up to the parent node, you don’t know whether there’s room in the parent node for the new item. If the parent node is full, you’ll need to split it, too, and move another value up to its parent node. In the worst case, that node must also split, and the split moves all the way up to the root causing a root bucket split.
When the root node splits, the tree must create a new root that has two children: the old root and the newly created node. This gives B-trees the rather unusual feature of growing from the root rather than from the leaves. Growing from the root ensures that all of the leaf nodes are always at the same level in the tree.
|Figure 6. Ripe Root: A bucket split can occur only if the root node is full.|
In a large database, a root bucket split working its way all the way up the tree can take a considerable amount of time. Fortunately, root bucket splits don’t happen very often. To force a root bucket split in the tree shown in Figure 5, you could add values S and U to fill the rightmost leaf node, and then V to split that node and fill the root node. Figure 6 shows the result.
Now you could add values A and C to get the tree shown in Figure 7 below.
Finally, you can make a root bucket split occur if you add another value to the leftmost leaf node in Figure 7. Figure 8 shows the new tree if you add the value F, causing the leftmost leaf and the root to split. If you compare Figure 7 with Figure 8, you’ll see that most of the values in the tree have been rearranged.
Also notice that the tree shown in Figure 8 contains a lot of empty space, so it will be quite a while before another root bucket split can occur.
Adding a node to a B-tree can be messy, but deleting a value can be even messier. There are several cases to consider.
|Figure 9. Easy Elimination: To remove item B, simply remove it from its node and slide keys C and D left to fill in the space.|
First, in the easy case, you search the tree until you find the target item in a leaf node and you simply remove it. For example, suppose you want to remove the item with key value B shown in Figure 9.
In this case, you simply remove B from its leaf node and slide key values C and D over to fill in the hole. Figure 10 shows the result.
|Figure 10. Sneaky Swap: To remove item E, first swap it with item D and then remove it from its new position in the leaf node.|
But what if the value you want to delete isn’t in a leaf node? For example, suppose you want to remove item E from the tree shown in Figure 10. If you remove item E from the root node, then there’s no value in that node that you can use to decide whether you should go down the left or center branch when you later search for a value.
The solution is to replace E with a new value that you can use to decide which branch to go down. For the new value to work, it must be larger than any value down the left branch and smaller than any value down the middle branch. The rightmost value to the left of the target value E is the largest value in the tree that is smaller than E. In this example, that value is the value D in the tree’s next level. In a bigger tree, the value might be many levels down, but it is always the rightmost value to the left of the target.
|Figure 11. Balance Regained: After swapping values D and E and then removing value E, the tree is once again a B-tree.|
To remove value E, find the rightmost value to its left (D) and swap it with the target value E. Then continue down the branch to the left to find the target E in its new position and remove it from the leaf node. Figure 11 shows the result.
Notice that the values are again in their correct order and you can use the value D to decide whether to go down the left or middle branch out of the root node.
That explains how to remove a value from a leaf node or an internal node, but what if a leaf node contains too few values? For example, suppose you want to remove the value C from the tree shown in Figure 11. If you just removed that value, then its leaf node would contain only one value and that would violate the B-tree rule that every node in this tree (except the root) must contain between two and four values.
The solution, at least in this example, is to take the node’s remaining values (A), combine them with the values in the node’s sibling (F, G, and H), and add in the value in the parent node that separates them (D). Putting all of these values together, you have five values: A, D, F, G, and H. That’s enough to put the leftmost values (A and D) in the left node, the rightmost values (G and H) in the right node, and move the middle value (F) up to the parent node. Figure 12 shows the result.
|Figure 12. Sibling Shuffle: If a leaf doesn’t have enough values, you can move some values over from a sibling.|
If the leaf’s sibling doesn’t have enough values to share, then you could try to borrow some from the leaf’s other sibling if it has one. For example, if you were removing items from the middle leaf node in Figure 12, you could borrow items from either the left or right leaf.
There’s one final case to consider. What if you remove a value from a leaf so that leaf has too few values, but neither of its siblings has enough values to spare? For example, suppose you want to remove the value A from the tree shown in Figure 12. That would leave that leaf with only one value. Normally, you would borrow an item from a sibling node, but this time the sibling has only two values so it can’t afford to lose one.
The solution is to notice that if the sibling doesn’t have enough values to share, then this node together with its sibling contain so few values that you can merge them into a single node. In this example, you take the values from those two nodes (A, D, G, and H) plus the value in the parent separating them (F) and you put them all in a single node. You remove the separating entry from the parent node and slide the parent’s other values over to fill in the hole. Figure 13 shows the result.
|Figure 13. Mandatory Merger: If a node contains too few values and its siblings don’t have any to spare, merge the node with a sibling.|
Remember when adding a new value causes a node to split you can pass a value up to the parent node and sometimes that might cause the parent to split? An analogous situation can occur here. When you remove a value and two nodes merge, you remove a value from the parent node. That may mean the parent node contains too few values so you will need to redistribute its values with a sibling or merge it with a sibling. That would be the case in Figure 13 if the node containing the single value J were not the root node. (Remember the root node is exempt from the B-tree rules.)
Just as a bucket split can travel all the way to the tree’s root, a bucket merge can also travel all the way to the root, and it can take quite a bit of time. Figure 14 shows a tree that is primed for merging. (It’s the same as Figure 8, which just underwent a root bucket split.)
If you remove the value V from the tree shown in Figure 14, its leaf node contains only one value. Its sibling contains only two values, so it cannot share. The solution is to merge the rightmost two leaf nodes plus their separating value U.
In practice, databases pick node sizes that naturally fit the disk drives that hold them. For example, suppose a disk uses a 2KB block size. That means the disk grabs data only in chunks that are multiples of 2KB, so it takes just as long to read 2KB of data as it does to read 10 bytes of data. In that case, you may as well make a B-tree node take up as much space in the 2KB block as possible.
Suppose a tree’s key values are 40-character strings (we’ll use Unicode so that’s 80 bytes each) and you store a 4-byte pointer with each value pointing to the corresponding record. Each branch within the tree is another 4-byte pointer. Then if each node contains up to 2 * K values, the nodes occupy 2 * K * (80 + 4) + (2 * K + 1) * 4 = 176 * K + 8 * K + 4 = 184 * K + 4 bytes. If you want this to be less than 2KB, then 184 * K + 4 < 2048 so K < (2048 ? 4) / 184, which is a bit more than 11. That means you can store between 11 and 22 values per node and still store a node in one 2KB block.
In that case, each node’s minimum degree is 11, so a tree containing N values would be log11(N) levels tall. For example, a one-million-node tree would be only log11(1,000,000) or fewer than six levels tall. That means to locate a value in a tree containing one million items, you would need to search at most six nodes in the tree, and that would require only six disk accesses. Inserting or deleting a value might take a bit longer, depending on whether bucket splits, sibling redistributions, and bucket merges occurred.
If you make nodes, use a larger multiple of the disk’s block size. You can build even shorter trees. For example, it may not take much longer to read two adjacent 2KB blocks from a disk than it takes to read a single block. In that case, you could make the nodes twice as large and get trees that are even shorter. You might spend a tiny bit of extra time reading each node but you would need to read fewer nodes.
Another trick that can improve performance is to keep the tree’s root and first-level nodes in memory instead of requiring you to read them from disk every time you need them. For the previous tree with 2KB nodes holding up to 22 values, that would require caching 23 nodes totaling 46KB of memory. By sacrificing a measly 46KB of space, you can remove two disk accesses from a search that normally requires up to six, thus cutting your search time by a third. You still need to write those nodes to disk if they change but at least your searching will be faster. (You could cache the next level of nodes too, but 22 * 22 = 484 nodes on the next level, so the memory would start to add up.)
Now You Know How B-trees Work
B-trees let you quickly search for values. They rebalance themselves as you add and remove values, so they always remain relatively short and wide and that makes searching for items in a B-tree extremely fast. For example, a B-tree containing one million values using nodes that each hold between 11 and 22 values needs to examine only six nodes at most before finding its target.
This article provided you a better understanding of how B-trees work and how databases build and maintain indexes. (In fact, many databases use B+trees but the main ideas are very similar.) To get an even better understanding, build one yourself! Or at least download the B-tree program that I built. It’s available in the download area in both Visual Basic and C# versions. I didn’t include the code in this article because it’s fairly involved, requiring both top-down and bottom-up recursion with multiple routines calling each other to navigate through the tree.