While you can't be certain that a simple sorted binary tree will stay reasonably short, several kinds of tree structures automatically rebalance themselves to guarantee that they don't become too tall and thin. Red-black trees, AVL trees, 2-3 trees, B-trees, and B+trees are all types of balanced tree. They all guarantee that the tree has a certain maximum height so they provide reasonable performance no matter what order you use when adding items to them. Unfortunately, adding and removing items from these trees is much more complicated than in a simple binary tree.
This article focuses on one of the kinds of self-balancing trees that is often used to provide indexing for databases: the B-tree. For some number K, a B-tree stores items in nodes that contain between K and 2 * K key values, and between K + 1 and 2 * K + 1 children. (The root node is a little different, so it is exempt from these rules.) Because the nodes hold a group of values instead of a single value like a node in a binary tree does, they are sometimes called buckets. A final condition is that all of the tree's leaves are on the same level.
For example, Figure 3 shows a B-tree where K = 2, so each internal node contains between two and four key values (in this example, letters) and between three and five children.
|Figure 3. Basic B-tree: This B-tree has nodes of degree 2 through 4.|
In this example, the root node has three children but sometimes it could have as few as one child. Note that the nodes contain some empty spaces for values that could be added to those nodes later.
Because each node in a B-tree has at least K children, the height of a tree containing N items is at most logK(N). Therefore, the tree cannot grow too tall and thin. If you add N items to the tree with K = 2 as shown in Figure 3, the tree's height will be no more than log2(N). The tree in Figure 3 holds nine nodes so its height must be no more than log2(9), which is about 3.17. The tree has height 2, which is less than 3.17, so it checks out.
Of course, the height guarantee doesn't come for free. Ensuring that each node has between K and 2 * K + 1 children takes some doing. The following sections explain how the tree rebalances itself when you add or remove items.
Finding an item in a B-tree is easy. Start by comparing the target value to each of the values in the root node from left to right and stop when you reach a value greater than or equal to the target value. If the node's value equals the target value, you've found the item.
If the node's value is greater than the target value, move down the link just to the left. If the target value is greater than every value in the node, move down the node's rightmost branch. Continue checking node values and moving down branches until you find the target value or reach a leaf node. If you reach a leaf and still haven't found the target value, then the target isn't in the tree.
For example, suppose you need to find the value K in Figure 3. You first compare K with the values in the root node G and R. K comes after G and before R so you move down the link that is drawn between G and R to the child node containing the values J, K, and P. In the child node, you find the target value K and you're done.
Now suppose you want to find the value U in Figure 3. You first compare U with the values in the root node G and R. U comes after both G and R, so you move down the node's rightmost link to the child node containing the values T and Y. In the child node, U comes after T but before Y, so you would move down the child link between those two values. However, this is a leaf node, so there is no such child link and you can conclude that the value U is not in the tree.
Finding items in a B-tree is relatively easy. Adding and removing items is much harder because you may need to rearrange the tree to ensure that it stays balanced.