etworks can be very interesting objects to study. They can represent obvious physical systems such as street, telephone, and electric networks. They can also represent less obvious arrangements of data that let you schedule complex tasks, assign jobs to employees, and find critical tasks that can delay a large project.
The first article in this series explained how to build networks and find the shortest paths between two points in the network.
This article describes four other network algorithms: traversal, minimal spanning trees, maximal flow, and minimal flow cut.
The goal of a traversal algorithm is to move through the network visiting each node that is reachable from a particular start node. When you visit a node, you might perform some task, such as listing the node, examining it for a particular piece of data, and so forth.
Figure 1 shows the NetAlgs example program, available with the downloadable code for this article in Visual Basic and C# versions. This program demonstrates all the algorithms described in this article. In Figure 1, the output shows the result of traversing a network starting from node B. Each time the program visited a node, it changed the node’s caption to lower case and changed the node’s background color to pink. (The numbers on the links between the nodes indicate their costs. Those aren’t needed for traversal.)
|Figure 1. Terrific Traversal: When it visits a node, program NetAlgs changes the node’s caption to lower case and gives it a pink background.|
The network shown in Figure 1 is directed (you can only cross a link in the direction of the arrows) so you cannot reach every node if you start at node B. In this example, there are no paths through the network from node B to nodes A, C, H, I, or J so they were not visited.
There are many reasons why you might want to traverse a network. For example, you might simply want to know if you can reach a node from the start node. Consider a complex campus phone network: You want to ensure that it can connect every building to every other building. Assuming all the links are non-directional (calls can travel both ways across the link), you can check by traversing the network, starting from any node and ensuring that you can reach every other node.
As another example, suppose you have two computer networks: a secure internal network and an external network that connects to the Internet. You want to be sure that the two networks are separate, so the internal network is safe from attack by cyber-banditos. In this scenario, by traversing the network starting from any internal node, you should not be able to reach any external node.
As a final example, you can also use a traversal to determine whether the maze of one-way streets in downtown Boston will let you get to your destination?or whether the old New England saying, “Ya can’t get theah from heah” is true.
Basic network traversal is relatively easy. The procedure is:
- Start at a selected root node.
- Visit the node.
- Follow each of the node’s links to recursively visit the node’s neighboring nodes.
The only trick to the algorithm is that you need to be careful not to follow the links in a closed loop; otherwise, you’ll enter an infinite loop that follows the same series of links repeatedly
To avoid infinite loops, you can simply mark each node that you visit. For example, you can give the node class an IsMarked property and set it to True when you visit a node. Now when you examine a node’s links in step 3, you don’t visit any neighboring node that is already marked.
Here’s the revised algorithm:
- Start at a selected root node.
- Visit the node and mark it.
- Follow each of the node’s links to recursively visit the node’s unmarked neighboring nodes.
When the algorithm completes, all the visited nodes are marked. To reset the network, the program should unmark those nodes. It can either repeat the algorithm again, this time visiting only those nodes for which IsMarked is True, setting it to False as it goes, or it can simply loop through a list of every node in the network unmarking them.
The NetAlgs example program uses the following code to traverse its network:
' Traverse executes this kind of routine on each node. Public Delegate Sub TraversalSub(ByVal node As PathNode) ' Traverse the network starting at this node. Public Sub Traverse(ByVal clicked_node As PathNode, _ ByVal traversal_sub As TraversalSub) ' Traverse the network. If clicked_node Is Nothing Then Exit Sub clicked_node.Traverse(traversal_sub) ' Unmark all nodes. For Each node As PathNode In AllNodes node.IsMarked = False Next node End Sub
The code first declares the TraversalSub delegate type, which defines the type of method that you must pass to the Traverse method. Each time it visits a node, the program calls the TraversalSub method for that node. That lets the main program do whatever it wants with the nodes without the Network class having to know anything about the details.
The Traverse subroutine performs the traversal. It takes as parameters the node where the traversal should start (node B in Figure 1), and the TraversalSub delegate it should call for each node visited.
The code does most of its work by calling the start node’s Traverse method. It finishes by looping through all the nodes in the AllNodes list and resetting their IsMarked flags. Here’s the PathNode class’s Traverse method:
' Traverse the network from this node. Public Sub Traverse(ByVal traversal_sub As Network.TraversalSub) ' Mark the node as visited. IsMarked = True ' Execute the traversal routine on this node. traversal_sub(Me) ' Visit neighbors. For Each link As PathLink In Links Dim nbr As PathNode = link.ToNode If Not nbr.IsMarked Then nbr.Traverse(traversal_sub) End If Next link End Sub
The method starts by marking the current node as visited. It then calls the traversal subroutine, passing the current node as a parameter. This is where the main program does whatever it wants to the node (NetAlgs changes the node’s caption and background color).
The Traverse method then loops through the node’s links and examines its neighboring nodes. If a neighbor node is not marked, the routine calls that node’s Traverse method.
The following code shows the ChangedNodeCaption subroutine defined by the NetAlgs main program. This routine takes a PathNode object as a parameter. It simply changes the node’s background color to pink and its label to lower case.
' This is the method performed on each node during traversal. Public Sub ChangeNodeCaption(ByVal node As PathNode) node.EllipseBrush = Brushes.Pink node.Label = node.Label.ToLower() End Sub
Finally, the following code shows how the program calls the Network object’s Traverse method. It passes the node that the user clicked and the address of the ChangeNodeCaption subroutine.
m_Network.Traverse(clicked_node, AddressOf ChangeNodeCaption)
Building a Minimal Spanning Tree
|Figure 2. Spanning Skills: The red links show a spanning tree rooted at node A.|
A spanning tree is a subset of links in a network that connect every node to an initial root node. For example, Figure 2 shows a spanning tree rooted at node A. By starting at node A and following the red links, you can reach every node in the main body of the network. As you can see, there are no red links leading to nodes H, I, and J so you can’t reach them from node A.
The tree shown in Figure 2 is actually a shortest path tree rooted at node A. See the previous article about shortest paths for extra details, but for now just note that a shortest path tree is also a spanning tree.
If you add up the costs of the red links in Figure 2, you’ll see that this spanning tree has a total cost of 57. A minimal spanning tree is a spanning tree that has the least possible total cost.
For example, in Figure 2 if you were to remove the B–D link from the tree and add the E–D link, you still have a spanning tree (there’s still a path to node D). However, that tree’s total cost is now only 55 because you removed a link with cost 7 and added one with cost only 5.
Figure 3 shows a minimal spanning tree for this network. Depending on a network’s shape and costs, it may have more than one minimal spanning tree. This network has only one?the tree shown in Figure 3.
|Figure 3. Superior Spanning: The red links show a minimal spanning tree rooted at node A.|
Spanning trees are useful for finding the least expensive way to connect the nodes in a network. For example, suppose you want to build a network connecting a group of buildings or company sites. You could make a network that includes every possible link between sites. The network would not include impossible links such as those that cross rivers, government land that you don’t have permission to cross, and so forth. Calculating the minimal spanning tree will show the cheapest way to build a network connecting your sites.
As another example, suppose you’re designing an electrical circuit and you need to route power to specific spots (outlets in a house, locations on a chip, appliances on a boat, and so forth). The minimal spanning tree for that network shows the best way to run the power lines.
You can find a minimal spanning tree for a network by using a simple greedy algorithm. In each step of a greedy algorithm, the program makes a decision that works as much as possible towards a good solution. In the minimal spanning tree algorithm, that means adding the least expensive link to the tree that connects a new node to the tree.
Here’s how the algorithm works:
- Add the root node to the spanning tree.
- While a node exists that’s not connected to the spanning tree:
- Consider the links connecting nodes in the tree to nodes not in the tree. Add the least costly of those links to the tree.
Figure 4 shows the steps the spanning tree algorithm follows for this network. It starts by adding the root node A to the tree.
|Figure 4. Spanning Step-by-Step: At each step, the program adds the least costly link that connects a new node to the tree.|
In the first step, the program considers the links leaving node A that connect to nodes that are not in the tree. (Initially A is the only node in the tree.) Those are the A–B and A–C links with costs 10 and 9. The A–C link has a smaller cost so the program adds that one to the tree.
In step 2, the program considers the links leaving nodes in the tree leading to nodes not in the tree. That includes links A–B and C–E. The C–E link has the smaller cost of 4 so the algorithm adds that link to the tree.
In step 3, the links connecting new nodes to the tree are A–B, E–D, and E–F. The E–D link has the smallest cost of 5 so the program adds that link to the tree.
In step 4, the links connecting new nodes to the tree are A–B and E–F. The A–B link has the smaller cost of 10 so the program adds that link to the tree.
In step 5, the links B–D and B–E are now considered but they connect to nodes that are already in the tree so the algorithm ignores them. The only link connecting a node in the tree to a node that is not is the E–F link so the program adds it to the tree.
In step 6, the only link connecting a node in the tree to a node that is not is the F–G link so the program adds it to the tree.
The algorithm makes one more loop to consider the G–D link, discovers that the link connects to a node already in the tree, and stops.
The code in Listing 1 shows how NetAlgs finds minimal spanning trees.
In Listing 1, the call to ResetAlgorithms resets node and link properties, such as those that indicate whether a node is in the spanning tree. After resetting the algorithm properties, the program makes an empty list named cl to hold candidate links that might be good to add to the spanning tree. Initially the list is empty.
The program then enters a loop that runs until the tree is complete. Inside the loop, the program examines the links emanating from the most recently added node, and adds them to the candidate list. Initially that node is the tree’s root node. (In Figure 4, the root node is A, and its links are A–B and A–C.)
Next the program loops through the candidate list removing any links that lead to nodes that are already in the spanning tree. Then it loops through the candidate list again, to find the remaining link with the smallest cost. If one exists, it adds that link and its destination node to the spanning tree. The program repeats this process until the candidate list is empty. At that point, every node reachable from the root node and the shortest link to that node is in the spanning tree.
Exploring the Maximal Flow Algorithm
In addition to costs, links in some networks also have capacities, which indicate how much of some quantity can flow through the link in a given period of time. For example, a street might be able to hold a maximum number of cars per minute, a wire might be able to hold a certain amount of current, and a storm drain might be able to swallow a certain number of gallons of water per minute.
In a maximal flow algorithm (also called “max flow”), the goal is to figure out how much flow the network can support between two points.
|Figure 5. Fantastic Flows: In the maximal flow problem, the goal is to find the largest possible flow from one node to another.|
For example, Figure 5 shows a grid-like storm drain network. Blue links show the maximal flow from node 0,0 to node 3,4. The numbers on a link indicate the link’s flow and capacity. For example, the 2/4 on the upper 0,0–1,0 link means that link is currently carrying 2 units of flow but could carry up to 4.
Interestingly, not every use for maximal flow calculations models a physical situation. A particularly elegant use of maximal flow calculations performs work assignment. Suppose you have a group of employees A, B, C, and so forth who have particular skills. You also have a set of jobs 1, 2, 3, and so forth that require particular skills. Without maximum flow calculations, matching employees to jobs so you can perform as many jobs as possible can be tricky, particularly if you have lots of employees, jobs, and skills.
To assign jobs with a maximal flow calculation, you can use a network. Create a node for each employee and a node for each job. Connect each employee to the jobs that he or she has the skills to perform. Create artificial start nodes and connect them to each employee. Finally, connect each job to an artificial end node. Give every link a capacity of 1 and calculate the maximal flow from the start node to the end node. The calculation will find an assignment of employees to jobs that lets you do as many jobs as possible. The flows even show which employee to assign to each job.
Figure 6 shows an assignment of four employees to four jobs. In this example, employee A is assigned to job 1, employee C is assigned to job 2, and employee D is assigned to job 3. (The caption on the C-2 link looks like it says 0/1 but that caption is actually for the D-1 link sitting on top of it. The C-2 link really has flow 1/1. You can tell because it’s blue.) You can rearrange the assignments to change which jobs get done (for example, employee C could perform job 4 instead of job 3), but there is no assignment that will get all four jobs done.
|Figure 6. Marvelous Matching: You can use a maximal flow calculation to assign employees to jobs.|
You could probably find a good match for the case shown in Figure 6 by hand, but imagine that you have to do the employee/job matching task with a few hundred employees and jobs instead. In that case, a maximal flow algorithm could save you a lot of time and trouble.
Finding maximal flows is a bit more complicated than traversing a network or finding minimal spanning trees. The keys to this algorithm are the ideas of residual capacity and augmenting paths.
A link’s residual capacity is the amount of flow that you could add to the link. You can think of “unused capacity” if you like (“residual” just sounds more official), although residual capacity has one odd feature that makes “unused” seem not quite right.
Every link in the network implicitly defines a reverse link that connects the same nodes?but in the opposite direction. This link’s capacity is equal to the original link’s flow at any moment in time. For example, in Figure 5 the 0,0–1,0 link has flow 2, so the reverse link has a residual capacity of 2. In some sense (which will hopefully become clear shortly), you could push 2 units of flow across this reverse link.
The links with residual capacities, together with the implicitly defined reverse links, form a residual network. An augmenting path is a path through the residual network that connects the start and end nodes.
An augmenting path may cross links that have unused capacity. Moving across such a link corresponds to adding some flow to that link. For example, if a link has flow 2 and capacity 5, its residual capacity is 3, so you can push 1, 2, or 3 additional units of flow across the link.
An augmenting path may also cross implicitly defined reverse links. Moving across such a link corresponds to removing flow from the original link that defined it. For example, suppose a link has flow 2?so its reverse link has capacity 2. Moving across the reverse link corresponds to removing 1 or 2 units of flow from the original link. Removing flow like this may allow you to redistribute flow to use links that were previously underused.
Here’s the algorithm:
Find an augmenting path in the residual network from the start node to the end node following links with residual capacities greater than zero.
|Figure 7. Awesome Augmentation: The maximal flow algorithm uses augmenting paths to improve flow during each step.|
Follow the augmenting path and find the smallest residual capacity of the links in the path. (Suppose the smallest residual capacity is 2.)
Follow the augmenting path again, adding flow to use up the smallest capacity of the links in the path. (Add 2 to each link’s flow.) When you follow a reverse link, update the corresponding forward link by removing the corresponding flow from it.
Repeat until there are no more augmenting paths.
Figure 7 shows an example.
When the network contains no flows, the network’s links have 0 flows so the implicitly defined reverse links have 0 capacities.
In the first step, because the reverse links have 0 capacities, finding a path is simply a matter of finding a path through the network’s original links (shown on the left in Figure 7). One such path visits the nodes 0,0–0,1–0,2–1,2–2,2–3,2.
The links with the smallest residual capacity along this path have residual capacity 2 so the algorithm adds a flow of 2 along each of the links. The image in the middle of Figure 7 shows the updated flows.
The second step is harder to visualize. The blue links in the middle of Figure 7 implicitly define reverse links. All the blue links carry a flow of 2 so their reverse links have a capacity of 2. Now the program must find a path through the residual network using the unused capacity of the regular links plus these reverse links with capacity 2.
A good beginning for a path from the start node towards the end node visits the nodes: 0,0–1,0–2,0–2,1–2,2. At this point, you might like to go to node 3,2 to reach the end node but the 2,2–3,2 link is already at full capacity with a flow of 2 so you cannot push any additional flow across that link.
Instead of following the 2,2–3,2 link, the augmenting path follows the reverse link 2,2–1,2, which has capacity 2. The path can then reach the end node by visiting the nodes 1,3–2,3–3,3–3,2.
The links with the smallest residual capacity along this new path have residual capacity 2 so the algorithm pushes an additional flow of 2 along each of the links. For the regular links, it adds a flow of 2. For the reverse link 2,2–1,2, the algorithm removes a flow of 2 from the corresponding forward link 2,2–1,2. (Do you see how pushing flow across the reverse links rearranged the flow?)
The image on the right of Figure 7 shows the result. At this point, the links leaving the start node are at full capacity so there can be no more augmenting paths and the algorithm is finished.
In practice, you don’t really need to build the reverse links. In fact, you don’t really need to build the residual network at all. Instead when you consider a node, you can examine the links leaving and entering the node. When you consider moving across a link in its normal direction, you calculate the link’s residual capacity by subtracting its flow from its capacity. When you consider moving across a link backwards, you simply use the link’s current flow as its residual capacity.
The code in Listing 2 shows how program NetAlgs calculates maximal flows.
The code first resets properties used by the algorithm. Next it enters a loop that continues until there are no more augmenting paths.
Inside the loop, the code sets each node’s IsInCl value to False, and sets its BestLink value to Nothing (null in C#). The program uses those values to build the augmenting path, so this removes any previously built augmenting paths. Similarly, the code resets each link’s IsInTree property to False to remove it from previous augmenting paths.
Next, the program creates a candidate list to hold nodes in a growing “augmenting tree.” It adds the start node to this list and then enters another loop that finds an augmenting path.
While the candidate list is not empty, the program removes the first node from the list and considers its links. If a link leads to a neighboring node that has not already been added to the candidate list, and if the link’s residual capacity is greater than 0, then the program adds the node to the candidate list. It also sets the node’s BestLink property so the program knows which link was used to add the node to the “augmenting tree.”
The algorithm then examines the node’s back links in a similar manner.
Before ending the inner loop, the program determines whether the end node is in the candidate list. If the end node is in the list, that means the program has found an augmenting path from the start node to the end node so it breaks out of the inner loop.
Outside the inner loop, the program determines whether it found an augmenting path. If the candidate list emptied before it found an augmenting path, then there are no more augmenting paths so the program breaks out of its outer loop.
Assuming it did find an augmenting path, the program follows it to find the smallest residual capacity along the path. It then follows the path again to update the flows on the links along the path.
The program repeats all these steps until it can find no more augmenting paths.
If you studied the code in the earlier article, “Network Know-How: Finding Shortest Paths,” you’ll find this code very similar to that used by the label-setting shortest path algorithm. Both algorithms examine the links leaving a node to build a tree rooted at the start node, but there are two main differences. First, the shortest path algorithm selected the next node to visit?so it followed the shortest possible link. The maximal flow algorithm doesn’t care what link it follows; it will accept any augmenting path, so it just visits the next node in the candidate list. Unlike a label-setting algorithm it cannot make a mistake because any path is fine. And because it can’t make a mistake, it doesn’t need to go back and fix mistakes like the label-setting algorithm does.
Second, the maximal flow calculation must consider back links. A shortest path algorithm can only follow real links, not implicitly defined back links.
Making Minimal Flow Cuts
The opposite type of network algorithm, a minimal flow algorithm, also has useful applications. In the minimal flow cut problem (also called “min flow cut”), you need to remove the set of links with the smallest total capacity from the network so that after their removal, no path exists between a given start and end node.
This has some direct physical applications. For example, suppose you need to separate two points on an electrical circuit by cutting the circuit in the fewest possible places. Or suppose you want to know how many bridges you need to cut to send Manhattan floating into the Atlantic.
The minimal flow cut problem is also useful for studying how robust a network is. Suppose you have a computer, phone, electric, or other network. The minimal flow cut between two points tells you how many places the network must fail before you lose connectivity between those points.
|Figure 8. Careful Cuts: The minimal flow cut algorithm separates two nodes by removing a set of links that have a minimal total flow.|
For an example, consider the network shown in Figure 8. Suppose you want to find a minimal flow cut between nodes 0,0 and 3,2. You could separate those nodes by removing the links 0,0–0,1 and 0,0–1,0. Those links have capacities 4 and 5 respectively, so the total removed capacity would be 4 + 5 = 9.
Alternatively, you could remove the links 3,1–3,2, 2,2–3,2, and 3,3–3,2. All those links have a capacity of 2, so the total capacity removed would be 2 + 2 + 2 = 6, which is an improvement over the previous solution.
So what’s the minimal capacity you can remove to separate the start and end nodes? You might take a few minutes to see if you can find an even better solution before you read further.
It turns out that the minimal flow that you need to remove to separate the start and end nodes is equal to the maximal flow between those nodes. The rightmost image in Figure 7 shows that the maximal flow between nodes 0,0 and 3,2 in this network is 4, so you know that the minimal cut to separate those nodes removes a total capacity of only 4 from the network. (Did you find such a cut?)
In fact, the maximal flow calculation even shows you one way to find the cuts that you need to make. Here’s the algorithm:
- Perform a maximal flow calculation.
- Beginning at the start node, traverse the final residual network using links that have residual capacities greater than 0.
- Place all the nodes reached in Step 2 in set A. Place all the other nodes in set B.
- Remove all the links connecting nodes in set A with nodes in set B.
Why does this work? The high-level explanation suitable for managers, history majors, and others who don’t want to know all the details is that the maximal flow must pass through a choke point somewhere. There’s some place in the network where the flows are restricted to the maximal flow?and that’s where you remove links to cut the network.
For a more detailed explanation, suppose you have a total flow of F moving through the network from the start node to the end node. Now suppose you make any cut that separates the network into two sets A and B so the start and end nodes are separated.
Now notice that the net flow across the cut must be equal to the flow F reaching the end node. Flows may move back and forth across the cut, but in the end, F units of flow must reach the end node, so the net flow across the cut must be F.
Now consider the cut produced by the algorithm. First notice that this cut does separate the start and end nodes. If it did not, then there would be an augmenting path through the network from the start node to the end node. Assuming you have completed the maximal flow calculation, there must not be such a path.
That means the net flow across the cut is equal to the maximal flow. But that’s only the net flow. What if flows move back and forth across the cut? In that case, this cut wouldn’t have the minimum possible capacity. (For a concrete example, suppose the maximal flow is 1 and the flow moves across the cut towards the end node, back across the cut towards the start node, and then back again towards the end node. In that case, the net flow is 1 but the capacity across the cut moving back and forth is 3.)
All that remains is to show that the maximal flow does not cross back and forth over the cut. Suppose a link crosses back toward the start node. In that case, the end node of the link is in the set A that is reachable from the start node in the residual network. But this link has flow across it so the implicitly defined reverse link has a residual capacity greater than 0. That means you can reach the link’s start node in the residual graph. But if both the link’s nodes are reachable from the start node, then they are both in set A and the link does not cross the cut. That contradicts our assumption that the link crosses the cut?so no such link can exist.
|Figure 9. Cut Complete: The links that must be cut are drawn in dashed red lines.|
When maximal flow is moving through the network, the net flow across the cut must equal the maximal flow. You just saw that there can be no links moving flow backwards across the cut, so the total flow across the cut must also equal the maximal flow.
If this seems confusing, that’s because it is. The technical proof is even more confusing.
Figure 9 shows program NetAlgs displaying a max flow cut for the network shown in Figure 8. The nodes in set A are yellow. The links that were cut are drawn with dashed red lines. I’ve added a green line by hand (it’s not drawn by the program) to separate the nodes in sets A and B.
Notice that if you add up the capacities of the cut links, you get 2 + 0 + 0 + 2 + 0 = 4, which is the network’s maximal flow. Also notice that the cut links move only across the cut towards the end node; they never move back towards the start node.
You could modify the program slightly so it doesn’t bother removing links with zero capacity, because flow cannot cross them anyway.
Figure 10 shows the same network but with a different destination node?3,3.
|Figure 10. Confusing Cut: The boundary between sets A and B isolates node 3,1 and crosses some links that are not removed by the cut.|
There are two odd things about Figure 10. First, the cut isolates the node 3,1 so it is not connected to the other nodes in set B. In this example, the links that were cut to isolate the node had 0 capacities anyway so the algorithm didn’t really need to remove them?but removing them doesn’t cost anything.
Second, the green line crosses several links that are not removed by the cut. For example, the 3,3–3,2 link is not removed by the cut but the green line crosses it. Although those links have residual capacities greater than zero, they are situated in positions that are not reachable from the start node in the residual network. In other words, they are directional links that point in the wrong directions, so you can’t use them to add extra flow to the network. And because you can’t use them to find a path from the start node to the end node, there’s no need to remove them.
Next Steps and Getting More Information
This article describes four useful network algorithms: traversal, minimal spanning trees, maximal flow, and minimal flow cut. Traversal and minimal spanning trees are relatively simple. Maximal flow and minimal flow cut are more complicated.
Network algorithms is a heavily studied field, and there are many other interesting algorithms that you can use to perform such tasks as:
- Determining whether two networks are equivalent (it’s not as easy as you might think)
- Determining whether you can draw a network without any links crossing
- Determining whether a network is completely connected (whether there is a path between any pair of nodes)
- Finding special structures within networks, such as sets of nodes that are completely connected
For more information on network algorithms, see these other DevX articles: