### WEBINAR:

On-Demand

Application Security Testing: An Integral Part of DevOps

**Performance Comparisons**
In my tests,

LabelCorrectingHuge was faster than

LabelSettingHuge for smaller networks and slower for really big networks. Table 1 shows some of the times in seconds needed by each program to build shortest-path trees.

**Table 1**. **Performance Comparison with Express Links**: The label-setting algorithm does worse in smaller networks with express links.

**# Nodes** |
**# Links** |
**Label-Setting** |
**Label-Correcting** |

2,500 |
9,900 |
0.0041 |
0.0025 |

10,000 |
39,800 |
0.0297 |
0.0188 |

40,000 |
159,600 |
0.2813 |
0.2672 |

90,000 |
359,400 |
1.1875 |
1.2656 |

250,000 |
999,000 |
5.1250 |
17.8438 |

For the smaller networks, the label-setting algorithm spends too much time trying to pick the best possible node to add to the tree in each round. In contrast, the label-correcting algorithm spends very little time making that decision, and needs to fix up only a few mistakes so it's faster overall.

For the bigger networks, the express links play a crucial role. In those networks, the express links (which are built randomly) are much longer and a bit faster than the normal grid links. You can think of them as highway links cutting across a city's surface streets. Usually it's faster to use an express link if it starts and ends close to where you need to go. In these networks, the program can follow many incorrect paths before it discovers that the express links can give big improvements for those nodes.
Table 2 shows the results of tests with no express links. This time the label-correcting algorithm does much better because it can't make such large mistakes before catching them.

**Table 2**. **Performance Comparison Without Express Links**: With the express links removed, the label-correcting algorithm's times improve dramatically.

**# Nodes** |
**# Links** |
**Label-Setting** |
**Label-Correcting** |

90,000 |
358,800 |
0.3438 |
0.625 |

250,000 |
998,000 |
1.8750 |
0.1875 |

490,000 |
1,957,200 |
5.7344 |
0.4219 |

Removing the express links makes both algorithms a lot faster, but it makes the label-correcting algorithm really fly! The performance you will get for any given problem will depend on your exact network topology.

**Making Improvements**
I suspect if you really pick these programs apart, you'll find that they spend a lot of time inside various Visual Studio classes, such as the generic

List(Of PathNode) that they use to store their candidate lists. Adding and removing items from these lists takes a fair amount time.

A List(Of PathNode) also doesn't provide any help for finding the object that currently has the smallest BestDist value, which would really help the label-setting algorithm. It would be nice to use the SortedList class to locate the best node but the BestDist values of the items in the list change while they are in the list and the SortedList class won't update itself automatically when that happens.
You can probably achieve better performance if you build your own data structures. A good priority queue would make it a lot easier to find the candidate with the smallest BestDist value and would be fairly easy to update when BestDist values changed. Building priority queues is a topic I'll save for another day.

Even without building your own data structures, there are a few tweaks you can make to these algorithms that may boost performance. For example, suppose you're using a label-correcting algorithm and you're considering the neighbors of a node that you just removed from the candidate list. Suppose further that a neighbor's best distance can be improved by using the new path via the node you just removed. In that case, there are three possibilities.
First, if the neighbor has never been in the candidate list before, you simply add it.

Second, suppose the neighbor has been in the candidate list before but is not now in the list. In that case, you have previously removed the node from the candidate list and considered its neighbors. But when you did so, you had the wrong value for this node's shortest distance from the root node. In other words, you made a mistake.
That's no big deal unless you used that incorrect distance while building other paths! Perhaps some neighbor of this node should have used the new shortcut but didn't because you had the wrong BestDist value. In that case, you might have made several possibly long paths that must now be corrected.

The improvement is to add this neighbor (which was on the candidate list before but is not now) at the beginning of the candidate list instead of at the end. The idea is that this lets the algorithm reconsider this node and its neighbors sooner, so it can fix existing mistakes before they have a chance to grow even worse.
The third case for a neighbor that should be updated is that it is currently in the candidate list. In this case, you could leave the node alone, or you could try bumping its priority up and moving it to the front of the list.

Another improvement you can sometimes make is to partition the network. For example, the street map for a large country typically contains crowded urban centers separated by large relatively empty areas crossed by major highways. To find the shortest path from one city to another, you can usually find the shortest-path trees for the start and destination cities and then find the shortest path between them through the highway network. This lets you ignore the smaller streets in the other cities.
One final improvement is that, if you initially know the destination node, you can sometimes stop before you finish building the whole shortest-path network. In a label-setting algorithm, for example, you're done as soon as you reach the destination node. Of course, checking whether you have reached the destination node every time you add a node to the tree takes some time, so the check will improve overall performance only if the start and destination nodes are relatively close together, so you get to stop fairly early.

In a label-correcting algorithm, you can keep track of the best distance to the destination node found so far (initially it will be infinite). After you find a non-infinite path to that node, you can compare other nodes' distances to this best distance. If a node has a greater best distance, then you don't need to consider further paths that travel through that node; you might consider the node again later if you find a better path to it but you can skip it for now. This sometimes lets you trim out large swaths of the tree. But again, the tests themselves take extra time, so they'll only give you a net benefit some of the time. Often it's better to just grit your teeth and build the whole shortest-path tree.
Shortest-path calculations have many obvious uses, such as finding routes through street networks, but they also have subtler uses, such as helping you determine how similar two pieces of text are.

Using the label-setting and label-correcting algorithms described in this article, you can find shortest paths through even huge networks. In one test (with no express links), a label-correcting program found a shortest-path tree through a network containing one million nodes and almost four million links in about one second. However, when I tried a network with ten million nodes, I ran out of memory. But for now, a million-node network is probably big enough for my day-to-day needs.