  Search    Advertiser Disclosure
 TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK Specialized Dev Zones Research Center eBook Library .NET Java C++ Web Dev Architecture Database Security Open Source Enterprise Mobile Special Reports 10-Minute Solutions DevXtra Blogs Slideshow      Author Feedback Print Article Comment on this Article # Getting Started with the .NET Task Parallel Library: Multi-Core Case Studies : Page 2

## There's more to any programming tool than just theory—and the TPL is no exception. Learn about the techniques Rod Stephens used and the problems he encountered in these parallelism case studies.

 by Rod Stephens
 Sep 29, 2008
 Page 2 of 3 Traveling Parallel Paths
The article Four Heuristic Solutions to the Traveling Salesperson Problem contains the original sequential code and a full explanation of the problem. Applying parallel techniques to this problem works quite well.

In the traveling salesperson problem (TSP), the goal is to find the shortest route that visits a set of points and then returns to its starting point. Figure 2 shows the sample program ParallelTSP displaying an optimal route found by a parallel exhaustive search. If you look at the numbers, you'll see that the exhaustive and parallel exhaustive searches both found an optimal solution with the same total route length, but the parallel search took a lot less time. (Notice also that the "Two-Opt" heuristic got lucky and found an optimal solution in a lot less time.) Figure 2. Remarkable Routing: Program ParallelTSP finds the shortest route that visits a set of points and returns to its starting point.
Like 2BP, the exhaustive search for TSP uses an array to keep track of test solutions. To build the parallel TSP algorithm, I used an approach similar to the one I used for 2BP, giving each thread its own copy of the array.

In TSP, you can pick any point as the route's starting point and the parallel search then considers every other point as the second point to visit. For each choice of second point, the program used a separate thread to look for good solutions assuming the assignments for the first and second points. If the points are numbered 1, 2, 3, and so forth, then the threads looked for solutions starting with 1-2, 1-3, 1-4, 1-5, and so forth.

Although the parallel version of this program is much faster than the sequential version, the time needed still grows exponentially as the problem size grows. If you add a few more points to the problem, you won't be able to perform a parallel exhaustive search even if you have a fast quad-core computer.

As is the case with 2BP, for larger problems you'll need to turn to heuristics. Fortunately the heuristics for TSP are reasonably good, so you can get reasonable solutions for fairly large problems.

The heuristics for TSP are random. They pick a random solution and then try to improve it by making small changes until they decide they have gotten a reasonably good solution. They then start over with a new random solution and improve that one. The process continues until you decide you're not going to get a better solution in a reasonable amount of time.

These sorts of random algorithms are ideal for parallelization. Each random solution is independent of the others so different threads can work on different possible solutions without relying on shared data structures. For example, the program could launch a series of calls to the Two-Opt or Mix heuristics on separate threads. For big problems, that would allow you to find better solutions in a reasonable amount of time.

Multi-Core Mandelbrot Sets
To draw a Mandelbrot set, you consider each pixel in the image separately. Each pixel corresponds to a point in the complex plane C = x + y * i. To find a pixel's color, the program repeatedly iterates this equation:

``````   Z(n) = Z(n-1) * Z(n-1) + C
``````
In that equation, Z(0) = 0 + 0 * i and C is the value x + y * i corresponding to the pixel at position (x, y). The program iterates this equation until it reaches some maximum number of iterations (the sample program uses 64 by default) or the magnitude |Z(n)| starts heading towards infinity.

At that point, the program uses the number of steps it took |Z(n)| to exceed 2 to determine the pixel's color. For example, if |Z(n)| exceeds 2 after 30 iterations and you are drawing the Mandelbrot set with 16 colors, then you would assign this pixel color number 30 mod 8, or 6.

Figure 3 shows example program ParallelMandelbrot in action. Figure 3. Marvelous Mandelbrot: Program ParallelMandelbrot generates the Mandelbrot set.
You can find some other Mandelbrot programs on my VB Helper Web site. Go to the search page and look for "Mandelbrot." With a little effort, you can also find other Mandelbrot programs scattered all over the Web.

Because the program calculates each pixel's color independently, generating a Mandelbrot set seems like an ideal candidate for parallelization. You can simply divide the image up into pieces and build each piece in a different thread. Each pixel's value is independent on the values of the others, so the threads should not interfere with each other.

Unfortunately there's a big problem: to draw the pieces, every thread must access the same bitmap. You might think they would not interfere, because the threads would be writing to separate pieces of the image, but sadly, the Bitmap class is not thread-safe, so accessing it with multiple threads does cause problems.

You could use a lock to limit access to the bitmap to just one thread at a time—but that would require a lot of locks. The 600x600 pixel image in Figure 3 contains 36,000 pixels, meaning the program would have to create and release 36,000 locks, and that would hurt performance.

Alternatively, you could try chopping the image into separate bitmaps and making each thread build one bitmap. But there's a lot of overhead in chopping the image apart and merging the results back together. I tried this approach and the simplest implementation was slower than the sequential program.

Rather than following either of these approaches, I built an array of integers to hold the pixels' iteration counts. Each thread fills in the values for its pixels without interfering with the other threads. When all the threads finish, the main program uses the iteration counts to set the pixels' colors in the bitmap.

This parallel version of the algorithm divides the image into one-pixel wide vertical stripes, generating each in a separate thread. The image in Figure 3 is 600 pixels wide, so the program builds it with 600 threads.

Note that it takes much longer to generate some parts of the Mandelbrot set than others. The dark green pixels on the left edge of the figure head towards infinity quickly, but the red pixels in the middle of the figure have still not headed towards infinity even after 512 iterations.

Because some strips take longer to generate than others, it's a good thing the program uses a lot of threads. If I had divided the image into big chunks, for example, it's possible that a thread might be assigned to a chunk that took a very long time to build. In the worst case, other threads might finish their chunks first and the program would be left running only the slow thread on one CPU, while other CPUs were idle. Chopping the image into lots of smaller pieces reduces the time to build any one piece, so that approach guarantees that all CPUs will be kept relatively busy.

However, it's also important not to break the work into too many threads. Generating each pixel's image in a separate thread would require a lot more overhead and would probably hurt performance. The lesson here is that dividing a task increasingly finely reaches a point of diminishing returns.

For any particular task, you may need to do some experimentation to find the right level of parallelism. Keep in mind that you may someday want to run the program on the latest 64-core system, so you may want to use some extra threads. Alternatively, you may want to make the number of threads easy to change so you can increase the number when you get hardware with more cores. (The ParallelBootyDivider program described later uses that approach.)

As the title bar in Figure 3 shows, it took the program 0.27 seconds to generate the image. The sequential version took 0.43 seconds—almost twice as long. You can press the P and S keys to switch between parallel and sequential calculations. Author Feedback Email Article Print Article Comment on this Article  