Taming Trees: Parsing and Sorting

he first part of this series explained basic tree terminology and showed how to build a simple binary tree in Visual Basic or C#. It also explained how to perform traversals of trees but the only really practical application it showed was how to make a tree draw itself. That?s pretty handy if you need to display a tree, but trees can do so much more.

This article explains a few more practical uses for trees. First, it explains how to use a tree to parse arithmetic expressions. It then discusses sorted trees and shows how to use a sorted tree to build a simple animal guessing game.

Evaluating Expressions
You can represent an arithmetic expression such as -2*(8+7)/-10 with an expression tree or parse tree. Interior nodes in the tree represent operands such as *, ( ), and /. Leaf nodes represent numbers such as 2, 80, and 3.14159265.

Figure 1. Expressive Trees: This tree represents the expression -2*(8+7)/-10.

The value of a leaf node is simply the value of its number. To get the value of a non-leaf node, you calculate the values of its children and then combine them by using the appropriate operation.

For example, Figure 1 shows the parse tree for the expression -2*(8+7)/-10. The values of the two lowest nodes are 8 and 7. The value of their parent node is 8 + 7. If you work through the entire tree, you?ll find that the value of the root node is 3.

Making a parse tree evaluate itself is easy. The node class?s Evaluate function simply calls its children?s Evaluate function and then combines the results by using the appropriate operand. (More on this later.)

The tricky part is building the parse tree; you need to pick apart the expression and figure out what nodes to build to represent the expression?s pieces.

The example program EvaluateExpressions, which is available for download in Visual Basic and C# versions, uses an ExpressionNode class to represent the pieces of an expression. This class uses four variables to keep track of the sub-expression that it represents. Expression stores a string representing the node?s sub-expression. LeftChild and RightChild are references to ExpressionNode objects, which represent the node?s children in the tree. Op holds the node?s operand.

For example, the root node in Figure 1 has Expression = ?-2*(8+7)/-10? and Op = ?/?.The following code shows the ExpressionNode class’s constructor. The code saves the node?s expression and sets its children to Nothing. It then calls the ParseExpression method to do all of the fun work.

' Constructor.Public Sub New(ByVal new_expression As String)  ' Build the expression's parse tree.  Expression = new_expression  LeftChild = Nothing  RightChild = Nothing  ParseExpression()End Sub

Listing 1 shows the ParseExpression method. It?s fairly long and a little tricky, but here?s the general idea.

If you look again at the parse tree shown in Figure 1, you?ll see that the topmost node represents the last operator that should be applied when evaluating the expression. The node?s children represent the left and right operands for that operator.

To understand how this works, you first need a little background. The basic idea is to scan through the expression looking for the operator that should be evaluated last, create child nodes to represent the operands, and then make the children recursively parse their sub-expressions.

The trickiest part is figuring out which operator should be evaluated last, which is logically the place where you want to break up the expression.

To determine which operator should be evaluated last, the program uses the notion of precedence. An operator?s precedence determines whether it has priority over other operators. For example, normally when you evaluate an expression such as 2+3*4, multiplication has higher precedence than addition, so you perform the multiplication first. That means this expression is equivalent to 2+(3*4).

The ExpressionNode class uses the same precedence rules as Visual Basic and C#. The following table shows operator precedence in decreasing order. For example, unary + and ? have the highest precedence, so they are evaluated first.

Table 1. Operator precedence: The table shows operator precedence in decreasing order
Operators
Meaning
+, –
Unary plus, unary minus
^
Exponentiation
*, /
Multiplication, division
Integer division
%
Modulus
+, –
Addition, subtraction

Evaluating Expressions (cont’d)
If two operators have the same precedence, they are evaluated in left-to-right order. For example, the expression 2*3*4 is treated as (2*3)*4. (Note that in those cases it doesn?t really matter because multiplication is associative so (2*3)*4 = 2*(3*4) anyway.)

So the code scans through the expression looking for the operator with the lowest precedence to use as the dividing point. It counts opening and closing parentheses as it goes so it knows when it is not inside any parentheses (the operator with the lowest precedence cannot be inside any parentheses).

The code needs to handle a few special cases, such as telling the difference between unary +/? and addition/subtraction, expressions that are surrounded by parentheses such as (1+2), functions such as Sin(2/3) and Factorial(3*2), and literal values such as 0.11235.

After it finds the dividing operator, the code stores the node?s operator in its Op variable, creates two child nodes, and makes the children recursively parse their sub-expressions. See the code in Listing 1 for the details.

After you parse an expression, evaluating it is easy. The following code shows part of the ExpressionNode class?s Evaluate function. The node checks its Op variable. If Op is blank, then the node represents a literal value so the code returns that value converted into a Double.

If Op is not blank, then the node represents an expression. The code uses a Select Case statement to see which operator it represents. It recursively makes the child nodes evaluate their values and then combines the results appropriately, as shown in the code below.

Figure 2. Evaluate This: The EvaluateExpression program uses the ExpressionNode and ExpressionTree classes to evaluate expressions.
' Evaluate the parsed tree.Public Function Evaluate() As Double  ' See if we have an operand.  If Op = "" Then    ' No operand. This is a literal.    Return Double.Parse(Expression)  Else    ' See what the operand is.    Select Case Op.ToLower()      Case "^"        ' Power.        Return LeftChild.Evaluate() ^ RightChild.Evaluate()      Case "*"        ' Times.        Return LeftChild.Evaluate() * RightChild.Evaluate()      ... Code omitted ...      Case "sin"      ' Sine.        Return Sin(LeftChild.Evaluate())      ... More code omitted ...    End Select  End IfEnd Function

Figure 2 shows the EvaluateExpression program in action. The drawing code included in the ExpressionNode class displayed the parse tree (this code is similar to the code described in the first part of this series).

If you take out the drawing code, the ExpressionNode and ExpressionTree classes can evaluate arithmetic expressions pretty quickly. If you need to evaluate expressions on the fly, they work quite well.

With a little extra work, you can make a few useful variations on these classes. For example, you could allow a leaf node to hold a variable such as A or B. Then, when you evaluate the expression, you can look up variable values in a dictionary. For example, you could use a tree to solve A*Sin(X^2+Y^2) for specific values of A, X, and Y. If you need to perform this same calculation for a lot of different A, X, and Y values, you can save time by building the tree once and then evaluating it a bunch of times with a different variable dictionary.

Sorted Trees
A particularly interesting kind of tree keeps its nodes in sorted order. In a binary sorted tree, each node holds a value. To add a new value to the tree, you compare the new value with the root node?s value. If the new value is less than the root?s value, you recursively add the new value to the left child?s subtree. If the value is not less than the root node?s value, you recursively add the new value to the right child?s subtree.

Figure 3. Growing Trees: The new value 3 is added as the left child of node 4.

Eventually, you reach a node that doesn?t have the left or right child into which you would like to move. At that point, you add a new child node, into which you insert the new value.

For example, suppose you want to add the value 3 to the tree on the left in Figure 3. The program starts by asking the root node to compare its value to the new value 3. The value 3 is less than 5 so the program moves into the left child with value 2. The value 3 is greater than 2 so the program moves into node 2?s right child, which has value 4. The new value 4 is less than 4 so the program would move into the node?s left child but, because the node with value 4 has no left child, it creates a new left child and inserts the new value there instead. The result is shown on the right in Figure 3.

The following code shows how the SortedBinaryNode class adds a new value to the tree. The node compares the new value to its current value and then calls the AddValue method of its left or right child as appropriate. If the node doesn?t have the necessary child, it makes one.

' Add a new node to the sorted subtree.Public Sub AddValue(ByVal value As String)  ' See if we this value belongs to our right or left.  If value < Name Then    ' Go left.    If LeftChild Is Nothing Then      LeftChild = New SortedBinaryNode(value)    Else      LeftChild.AddValue(value)    End If  Else    ' Go right.    If RightChild Is Nothing Then      RightChild = New SortedBinaryNode(value)    Else      RightChild.AddValue(value)    End If  End IfEnd Sub

Figure 4 shows the program in action. As you add new nodes to the tree, the program displays the tree and shows the tree?s inorder traversal. Notice that the inorder traversal of a sorted tree lists the items it contains in sorted order.

Figure 4. All Sorts of Trees: SortedTree builds a sorted tree.

SortedTree uses the SortedBinaryNode class to create sorted trees. The code which accompanies this article contains SortedTree in both Visual Basic and C#.

You can use a similar method to locate items in the tree. To find an item from the root node, compare the node?s value with the target value and move either left or right into the children as appropriate. Eventually, you will either find the node containing the target value, or you?ll reach a point where the child node you need to follow is missing. In the latter case, the item is not in the tree.

An Animal Game
Sorted trees give you a new way to sort a list of items: simply add the items to a sorted tree and then read out the tree?s inorder traversal. While this method works, it?s generally slower than other algorithms designed specifically for sorting lists. However, this type of tree is useful if movement within the tree has some extra meaning.

For example, you can use a sorted tree to build an animal guessing game. Internal nodes store a Yes/No question. Leaf nodes store the name of an animal. When the program visits an internal node, it asks the user the node?s question. If the user answers Yes, the program, moves into the node?s left child. If the user answers No, the program, moves into the node?s right child.

When the program reaches a leaf node, it guesses the animal whose name is stored in that node. If the program is correct, it celebrates. If the guess is wrong, the program asks the user for a new question to add to the tree that will let it tell the difference between whatever the program just guessed and the correct answer. It adds that node to the tree so next time it can tell the animals apart.

Listing 2 shows how the AnimalNode class asks questions. Because this application doesn?t need to draw or traverse the tree, this is a very simple class so the whole thing is shown in Listing 2.AnimalTree uses the AnimalNode class shown in Listing 2. Click here to download both Visual Basic and C# versions and play the animal guessing game.

Complex Trees Equal Powerful Indexing
Parse trees allow you to evaluate mathematical expressions relatively efficiently at run time. You can easily modify the EvaluateExpressions program to support whatever other operators and functions you might require.

Sorted trees also allow you arrange items in sorted order fairly easily. Though the trees in the simple animal guessing game are in some sense toys, they demonstrate some of the principles used to build much more complex trees that can implement powerful indexing features for serious databases

The next tree article will explain B-trees, a type of tree similar to those used by many database management systems to provide indexes.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

Recent Articles: