isual Studio is chock full of classes to implement exotic data structures such as collections, hash tables, and dictionaries. One notable data structure that’s missing is the tree. Trees are useful for storing hierarchical data where each node has one or more children. Many developers use a TreeView control or XML objects to build trees in .NET, but both of those methods are rather heavyweight and come with additional baggage that you may not need or want.
This article explains what trees are, how to build your own trees, how to work with them, and how to get them to do useful things such as drawing themselves, or listing their items in various sequences.
Before getting started, however, you need to understand some tree terminology.
Tree terminology is a cross between horticulture and genealogy. Horticulture supplies terms such as tree, node, root, branch, and leaf, while other tree terms come from genealogy, such as ancestor, parent, and child.
A tree is made up of nodes that are connected by branches. A node can have only one parent, but each node may have zero, one, or more child nodes. The branches are directed, which means you can tell which way they are pointing?so you can always discern which node is the parent and which node(s) are the children.
One node, called the root node has no parent. All other nodes in the tree are connected directly or indirectly to the root node. Normally, when you draw a tree, you draw the root at the top. That’s the opposite direction from the way an ash tree or an elm tree grows in nature, but it’s easier for most people to draw trees that way. With the root at the top, all branches are directed downward, so it’s easy to see which nodes are the children and parents of the others.
A node’s degree is its number of children. The degree of the tree is equal to the largest degree of any of the nodes. A tree with a degree of 2 is called a binary tree; a tree with a degree of 3 is called ternary. Beyond that, people usually call the tree “N-ary” (if they call it anything at all). For example, a degree 7 tree would be “7-ary.”
A node at the bottom of the tree that has no children is called a leaf node. By definition, a leaf is a node with degree 0.
A node’s level is its distance from the root node. To some people, that means the number of branches between the node and the root. Others count the nodes (including the node and the root), resulting in a level that’s larger by one than you would get using the other method. The difference is whether you consider the root to be at level 0 or 1. (Much like the way the bottom floor of a building is floor 0 in the UK and floor 1 in the US.) For no particularly good reason, I’m going to consider the root node level to be 1.
The height or level of the entire tree is the same as the largest level of any of its leaf nodes.
Note that branches can only connect parent nodes with child nodes. Branches cannot connect “sibling” nodes (nodes with the same parent) and cannot connect nodes across “generations” or levels in the tree.
|Figure 1. Terrific Tree: A tree is made up of nodes connected by branches.|
One consequence of this rule is that the tree cannot contain any loops or cycles. Another is that there is one unique shortest path from any node to any other node. To find that path, you move up the tree from the starting node towards the root until you reach a node that is a common ancestor of both nodes. You will eventually find such a common ancestor, even if you have to go all the way to the root. After you find the common ancestor, you move down through the branches to the destination node.
Figure 1 shows a simple tree.
Using Figure 1, you should be able to verify that:
- Node A is the root node (it has no parent).
- Nodes D, E, G, H, and I are leaf nodes (they have no children; degree 0).
- The degree of the tree is 3 because node F has the most children?3.
- The tree’s height is 4 because the farthest leaves (G, H, and I) are on the fourth level of nodes.
- The unique shortest path from node D to node H is D?B?A?C?F?H. (You can try finding paths between other pairs of nodes if you like.)
- There are no loops or cycles in the tree’s branches.
|Figure 2. Binary Bliss: In a full binary tree, every non-leaf node has exactly two children, except possibly one node?in this case node F.|
A full tree is one where every node either has the full degree of the tree (i.e., as many children as possible) or has no children.
Depending on what you need to do with the tree, you may allow one exception. In that case, the second-to-bottom level has all leaf nodes on the right and the rightmost non-leaf node may have less than the maximum allowed number of children.
For example, Figure 2 shows a full binary tree. In this tree, every non-leaf node has two children except for node F.
A tree traversal is a way of “visiting” every node in a tree. When you visit a node, you might do any of several things: draw the node, write its name into a string, set a parameter on the node, or something else depending on the algorithm you’re trying to implement.
There are three standard orders for traversing a binary tree: preorder, inorder, and postorder. You can define each of these traversals recursively by writing code to decide what to do at each node.
In a preorder traversal, you visit the node and then visit its children recursively. It’s called “preorder” because you visit a node pre- (before) visiting the children.
For example, consider the tree in Figure 2. This traversal outputs the node name?a letter in this case. You start the traversal at the root. The rule says to visit the node first so you output “A.” Then you visit A’s children.
To do that, you first visit node B and perform its preorder traversal. The rule says to visit the node first and then visit its children, so you output “B” and then move to node D. You output “D” and visit D’s children. Finally, when you visit node H, you’ve reached a leaf, so you just output an “H.” Normally you would visit the node’s children but there aren’t any, so this recursion ends and you move back up the tree to node D.
|Figure 3. Talented Traversal: In a preorder traversal, you visit a node first and then visit its children.|
Now that you’ve visited node H, you visit node D’s other child: node I. Again, this is a root node so you output “I” and move back up to node D again. At this point you’ve visited node D and all its children, so that level of recursion ends and you move back up to node B.
From node B, you’ve already visited node D so you visit the other child: node E. The process continues like this until you have visited every node and output them all.
Figure 3 shows the tree from Figure 2 but with little numbers added showing the order in which a preorder traversal visits the nodes.
The final result of the preorder traversal is: A, B, D, H, I, E, J, K, C, F, L, G.
A postorder traversal is similar to a preorder traversal except that you visit a node’s children before you visit the node itself. I’ll leave it as an exercise for readers to verify that the postorder traversal for the tree in Figure 2 is: H, I, D, J, K, E, B, L, F, G, C, A.
The final type of traversal, an inorder traversal, is similar to the others except it always visits a node’s left child, then the node itself, and finally, the node’s right child. To be brief, the inorder traversal for the tree in Figure 2 is: H, D, I, B, J, E, K, A, L, F, C, G.
Building trees in Visual Studio is pretty easy. First, you make a class to represent a node. This first example builds a binary tree.
Unfortunately the word “Node” is pretty generic (kind of like Object) so it tends to get used in a lot of places. Microsoft has already used the name “Node” in the Microsoft.ManagementConsole, MicrosoftComputeCluster, and Microsoft.TeamFoundation.WorkItemTracking.Client namespaces?and who knows where else. They’ve also used “TreeNode” in a couple of namespaces so it’s hard to pick a completely unambiguous name.
Because this class is the basis for a binary tree, I’ll name the class “BinaryNode.”
Next, give the class a Name property and two properties that will hold references to the left and right children (also BinaryNodes).
To make building nodes and trees easier, give the class a constructor that initializes the node’s name and children:
Public Class BinaryNode Public Name As String Public LeftChild As BinaryNode Public RightChild As BinaryNode ' Constructor. Public Sub New(ByVal new_name As String, _ Optional ByVal left_child As BinaryNode = Nothing, _ Optional ByVal right_child As BinaryNode = Nothing) Name = new_name LeftChild = left_child RightChild = right_child End Sub
That’s enough to build the basic tree but not enough to do much with it, so it’s worth adding some traversal methods. Listing 1 shows an initial BinaryNode class with preorder, postorder, and inorder functions that return each node’s name and its children’s names in the proper sequences.
To make using the tree easier, you can build a tree class. Again, to avoid name collisions, call this one “BinaryTree.”
Give the tree a public BinaryNode property named Root, and a constructor that initializes the property to a new or specific BinaryNode.
Making the tree produce full traversals is easy. It simply checks whether the root node has been defined and, if so, delegates the call to that node. Note that many tree methods work similarly; in general, they verify the presence of a root node and then let it do all the work.
The following code shows the BinaryTree class:
Public Class BinaryTree Public Root As BinaryNode ' Constructor. Public Sub New( _ Optional ByVal new_root As BinaryNode = Nothing) Root = new_root End Sub ' Return the tree's preorder traversal. Public Function Preorder() As String If Root Is Nothing Then Return "" Return Root.Preorder End Function ' Return the tree's postorder traversal. Public Function Postorder() As String If Root Is Nothing Then Return "" Return Root.Postorder End Function ' Return the tree's inorder traversal. Public Function Inorder() As String If Root Is Nothing Then Return "" Return Root.Inorder End Function End Class
All this is pretty interesting but?not to put too delicate a point on the issue?it’s not very useful. It’s obviously easy to build a tree that can’t do much. The rest of this article teaches the tree to do something fairly useful: draw itself.
It’s not too hard to show a tree’s structure by indenting its nodes the way a TreeView control does or the way the left side of Window Explorer does. I’ll leave that as an exercise because I’d rather display the tree like the ones shown in Figures 1, 2, and 3.
This is one of those pieces of code that is deceptively short once you figure it out but that can take a while to get right, even if you’ve seen it before. The goal is to display each node centered above its children. To do that, you first need to figure out where the children will be drawn.
To keep the process as simple as possible, I’ve broken it into three steps. First, the code recursively traverses the tree, positioning all the nodes. That’s the tricky part. Next, the code recursively traverses the tree again, drawing the branches that connect the nodes. Finally, it traverses the tree a third time, drawing the nodes over the branches. The code in Listing 2 shows how the BinaryNode class positions a node and its subtree.
The PositionNode subroutine in Listing 2 takes two parameters, xmin and ymin, which are the minimum X and Y positions where it is allowed to position the tree. Notice that the parameter xmin is passed by reference so changes to its value here will be returned to the calling routine. When this routine finishes, it updates xmin to a value greater than the right edge of the subtree we are positioning now. Any remaining trees at this level will be drawn to the right of the current tree.
The subtree’s root node gets placed half the node’s radius below ymin, so the code sets the node’s m_Cy position accordingly.
Next the code recursively calls the PositionNode method for its left child node. That call recursively draws the entire left subtree and updates xmin to the right edge of that subtree. If this node has two children, you need to leave some room between their subtrees so they don’t touch. The code checks and, if the node does have two children, it adds a little space to xmin.
The code then recursively calls the PositionNode method for its right child node. That call recursively draws the entire right subtree and updates xmin to the right edge of that subtree.
At this point, the left and right subtrees have been completely positioned. In particular the left and right child nodes have been positioned so you can use them to figure out the original node’s X position. If this node has two children, the code sets its X position to the average of the children’s X positions, which centers it. If instead this node has only one child, then the code positions this node directly above its child.
If this node has no children, then the code places this node as far to the left as it can, so its left edge is at xmin. It then increases xmin so that when this call to PositionNode returns, xmin points to this node’s right edge.
Simple, right? After that, the rest is easy.
Subroutine DrawBranches, shown in the following code, draws the branches between a node and its children:
' Draw branches to our children. Public Sub DrawBranches(ByVal gr As Graphics, _ ByVal the_pen As Pen) If LeftChild IsNot Nothing Then gr.DrawLine(the_pen, m_Cx, m_Cy, _ LeftChild.m_Cx, LeftChild.m_Cy) LeftChild.DrawBranches(gr, the_pen) End If If RightChild IsNot Nothing Then gr.DrawLine(the_pen, m_Cx, m_Cy, _ RightChild.m_Cx, RightChild.m_Cy) RightChild.DrawBranches(gr, the_pen) End If End Sub
This code checks its left child and, if that child is present, it draws a line connecting the node with that child. It then recursively calls the child’s DrawBranches method to make it draw its child branches.
The code then repeats those steps for its right child. (I told you this part was easy.)
Finally, subroutine DrawNode, shown below, draws the node:
' Draw the node in its assigned position. Public Sub DrawNode(ByVal gr As Graphics, ByVal fnt As Font, _ ByVal fill_brush As Brush, ByVal font_brush As Brush, _ ByVal the_pen As Pen) ' Draw an outline. Dim rect As New Rectangle( _ m_Cx - RADIUS 2, m_Cy - RADIUS 2, RADIUS, RADIUS) gr.FillEllipse(fill_brush, rect) gr.DrawEllipse(the_pen, rect) ' Draw our name. Using sf As New StringFormat() sf.Alignment = StringAlignment.Center sf.LineAlignment = StringAlignment.Center gr.DrawString(Name, fnt, font_brush, m_Cx, m_Cy, sf) End Using ' Draw our children. If LeftChild IsNot Nothing Then LeftChild.DrawNode( _ gr, fnt, fill_brush, font_brush, the_pen) If RightChild IsNot Nothing Then RightChild.DrawNode(_ gr, fnt, fill_brush, font_brush, the_pen) End Sub
This routine first fills an ellipse with a background color at the node’s location. It then draws the same ellipse to outline the node. The code then draws the node’s name centered at its assigned position.
The routine then recursively calls the DrawNode methods of its children so they can draw themselves.
The following code shows how the BinaryTree class calls the BinaryNode methods to draw the tree. It calls the root node’s PositionNode method to recursively position all the nodes. It then calls the root’s DrawBranches method to recursively draw all the branches. Finally, it calls the root’s DrawNode method to recursively draw all the nodes:
' Draw the tree. Public Sub DrawTree(ByVal gr As Graphics, ByVal fnt As Font, _ ByVal fill_brush As Brush, ByVal font_brush As Brush, _ ByVal the_pen As Pen, ByVal xmin As Integer, _ ByVal ymin As Integer) If Root Is Nothing Then Exit Sub ' Position all the nodes. Root.PositionNode(xmin, ymin) ' Draw all the links. Root.DrawBranches(gr, the_pen) ' Draw all the nodes. Root.DrawNode(gr, fnt, fill_brush, font_brush, the_pen) End Sub
The final piece of code is the form’s Paint event handler. It simply calls the tree’s DrawTree method, passing it a bunch of drawing parameters:
' Draw the tree. Private Sub Form1_Paint(ByVal sender As Object, _ ByVal e As System.Windows.Forms.PaintEventArgs) Handles Me.Paint m_Tree.DrawTree(e.Graphics, Me.Font, _ Brushes.White, Brushes.Blue, Pens.Black, 10, 100) End Sub
|Figure 4. Tree Testing: Program BinaryTreeTest demonstrates the BinaryTree and BinaryNode classes.|
The example program BinaryTreeTest, which is shown in Figure 4 and available for download in both Visual Basic and C# flavors, demonstrates these methods.
The form’s Load event handler uses code to build a tree similar to the one shown in Figure 3. It then uses the tree’s methods to display the preorder, postorder, and inorder traversal methods. The form’s Paint event handler calls the tree’s DrawTree method to display the tree.
Trees are powerful data structures. This article provides a quick introduction to tree terminology and shows how to build binary trees. It also shows how to make relatively simple BinaryNode and BinaryTree classes that can build and draw a tree. But that’s not all; in my next tree article, I’ll explain how you can use trees to evaluate mathematical expressions. I’ll also show how to build and draw trees with higher degrees. Finally, I’ll explain B-trees and B+trees?the kinds of trees that databases use to build index structures.