RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Getting Started with the .NET Task Parallel Library : Page 5

If you have a multi-core computer, chances are your first CPU does all the work while the others spend most of their time twiddling their electronic thumbs. Learn to unlock the idle power of your underused CPUs to greatly improve the performance of your applications.


Betting on Futures
In TPL terms, a Future is basically a task that returns a value. It's like a deferred function. You start it running and then use its value later. If the Future hasn't finished calculating its value by the time you need it, it makes you wait while it finishes.

For example, consider the Fibonacci numbers. The first two values are defined as Fib(0) = 0 and Fib(1) = 1. For larger values of N, Fib(N) = Fib(N—1) + Fib(N—2).

The following code shows a function that calculates Fib(N):

   Private Function Fibo(ByVal N As Long) As Long
       If N <= 0 Then Return 0
       If N = 1 Then Return 1
       Return Fibo(N - 1) + Fibo(N - 2)
   End Function

There are much more efficient non-recursive implementations of this function, but this version is a good candidate for parallelism because in sequential mode, it takes a long time to calculate Fib(N) when N is larger than 20 or 30.

The following code performs a similar calculation but uses Futures to calculate the result for each recursive call:

   Private Function FiboFullParallel(ByVal N As Long) As Long
       If N <= 0 Then Return 0
       If N = 1 Then Return 1
       Dim t1 As Tasks.Future(Of Long) = _
           Tasks.Future(Of Long).Create( _
               Function() FiboFullParallel(N - 1))
       Dim t2 As Tasks.Future(Of Long) = _
           Tasks.Future(Of Long).Create(_
               Function() FiboFullParallel(N - 2))
       Return t1.Value + t2.Value
   End Function

This function checks the base cases N = 0 and N = 1 as before. It then creates two Future objects. The parameter to the Create method is a lambda (aka inline) function that returns the result of the FiboFullParallel function evaluated for N—1 and N—2. As each Future is created, it may begin executing on any of the system's CPUs.

The function finishes by adding the values of the two Futures and returning the result. If the Futures have finished calculating their values by the time the Return statement executes, all is well. If either Future is not finished, the program waits until the executing Future completes its calculation.

While this version works, it is extremely slow and inefficient. The problem is that calculating FiboFullParallel(N) requires an enormous number of recursive calls to FiboFullParallel.

For example, to calculate FiboFullParallel(10) the program must find FiboFullParallel(9) and FiboFullParallel(8). But to calculate FiboFullParallel(9), the program calculates FiboFullParallel(8) and FiboFullParallel(7); in other words, it has to calculate FiboFullParallel(8) twice.

If you follow the calculation further, you'll find that the program must calculate each of the intermediate values a huge number of times. For example, to calculate FiboFullParallel(10), the function calls itself 176 times. To calculate FiboFullParallel(20), the function calls itself 21,890 times. The number of calls grows very quickly as N grows.

Here's an improved version:

   Private Function FiboSplit(ByVal N As Long) As Long
       If N <= 0 Then Return 0
       If N = 1 Then Return 1
       Dim t1 As Tasks.Future(Of Long) = _
           Tasks.Future(Of Long).Create(Function() Fibo(N - 1))
       Dim t2 As Tasks.Future(Of Long) = _
           Tasks.Future(Of Long).Create(Function() Fibo(N - 2))
       Return t1.Value + t2.Value
   End Function

This function builds two Future objects but instead of calling back to this function, these Futures invoke the original sequential function Fibo.

Figure 3. Fast Futures: Program UseFutures uses Future objects to calculate Fibonacci numbers in parallel.

For example, when the main program calls FiboSplit(10), the Futures calculate Fibo(9) and Fibo(8). After that, function Fibo does not use Futures so it doesn't pay the overhead that they incur. In this version, the program splits the calculation into two Futures and after that the Futures execute sequentially.

The program UseFutures demonstrates these functions (see Figure 3). The Result label shows the results for the 37th Fibonacci number. The Sequential label shows that the original Fibo function took 1.88 seconds while the Split label shows that the FiboSplit function took only 1.20 seconds. The program won't calculate FiboFullParallel for N greater than 25 because it takes too long (about 2.6 seconds for N = 25; probably years for N = 37).

You can use the TPL classes and methods to split tasks into pieces that you can then run in parallel but, as the FiboFullParallel function demonstrates, don't split the problem too finely. If you break the problem into pieces that are each very easy to evaluate, the overhead of running them separately will outweigh the benefit of parallelism.

Earlier I briefly mentioned that you need to be careful to ensure that tasks running on different threads don't interfere with each other. There are many ways two threads can get in each other's way and I don't want to take the time to cover them all here, but I do want to describe a couple of techniques that you may find useful.

One of the easiest ways for two threads to lock horns is if they read and write the same variable. For example, consider the following trivial subroutine. It simply adds an integer value to a total:

   Private m_Total As Integer
   Private Sub AddNoLock(ByVal i As Integer)
       m_Total += i
   End Sub

This code performs its addition in a single step using the += operator, which is so quick and simple that you would think nothing could go wrong. Unfortunately, if two threads execute this same piece of code at the same time, they can conflict.

Suppose two threads (A and B) are running the AddNoLock subroutine with parameters 1 and 2 respectively. Now suppose thread A starts running its addition statement and reads the value of m_Total, which at the time is 10.

Now suppose thread B gets a turn. It reads m_Total, which is still 10, and adds 2 to it, saving the result back in m_Total, which is now 12. Thread B congratulates itself on a job well done and retires.

But thread A is still executing. It adds 1 to the value that it read previously (10) and gets 11, which it stores in m_Total.

The final result is that m_Total holds the value 11 rather than the expected value of 13. If the two threads had run sequentially, the result would be 13. But because they ran at the same time, the result is 11.

This is called a race condition. The two threads are racing to see which can read and write the variable first. Depending on the exact sequence of events, you can get one of several possible outcomes.

Race conditions can be extremely hard to debug because they don't happen every time. With the small window for problems in this example (a single line of very fast code), the odds of a problem are fairly small, so the problem won't appear every time you run the program. Even worse, stepping through the code in the debugger will probably prevent the threads from interrupting each other, so you won't see the problem in the debugger at all.

So what are the odds that two threads will interlace in just the right way to make this sort of problem occur? If you're an experienced programmer, then you know that this will happen with absolute certainty when you're demonstrating your program to your company president. But even when the big boss isn't there, the odds are still large enough to make race conditions a significant problem.

The sample program UseLocks uses the following code to demonstrate this problem:

   Dim num_trials As Integer = Val(txtNumTrials.Text)
   Dim start_time As Date = Now
   For i As Integer = 1 To num_trials
       m_Total = 0
       Parallel.For(1, 10001, AddressOf AddNoLock)
   Next i
   Dim stop_time As Date = Now
   Dim elapsed As TimeSpan = stop_time - start_time

The program accepts a user-entered number that controls the number of trials. For each trial, the code calls Parallel.For to invoke the AddNoLock subroutine shown earlier 10,000 times. In other words, if the user enters 100 for the number of trials, the program spins out a million (100 x 10,000) threads. That's plenty large enough to cause a race condition most of the time.

When I run the program for 100 trials, it takes only about 0.75 seconds to perform all of those additions but the result is not always the same. In fact, if I run only 1 trial, so the program starts 10,000 threads, it produces inconsistent results.

There are several ways you can avoid this type of problem. One of the best is to make sure the threads never, ever, ever read and write the same variables. They may read from some shared variables but they should not write into shared variables. If you give the threads separate variables to write into, there should be no problem. The Task example described earlier demonstrated this approach by using separate m_Tasks and m_ElapsedTime array entries for each task.

Another way to handle this problem is to use a lock. A lock reserves some resource for a particular thread and denies other threads access to the resource while it is locked.

There are a couple of easy ways you can make locks in Visual Studio. In Visual Basic, you can use the SyncLock statement. You pass this command an object used as a token to prevent other threads from acquiring a similar lock.

The following code, which is used by sample program UseLocks, demonstrates this approach. The AddSyncLock function uses a SyncLock statement to reserve access to the "Me" object (the object running this code—in this case, the form). It then adds its value to the total and releases the lock. If another thread tries to execute the function, it will block at the SyncLock statement until the first thread releases its lock:

   Private Sub AddSyncLock(ByVal i As Integer)
       SyncLock Me
           m_Total += i
       End SyncLock
   End Sub

Here's the equivalent C# version:

   private void AddSyncLock(int i)
           m_Total += i;
Figure 4. Lots of Locks: The SyncLock statement and the Interlocked class allow threads to use the same variable without interference.

Operations such as this that add one value to another are so common in complex multi-threaded applications that the Threading namespace provides an Interlocked class to make them easier. This class's static Add method creates an exclusive lock, adds a number to another number, and then releases the lock.

Example program UseLocks uses the following code to demonstrate Interlocked.Add:

   Private Sub AddInterlocked(ByVal i As Integer)
       Threading.Interlocked.Add(m_Total, i)
   End Sub

Figure 4 shows program UseLocks in action. The first result, which doesn't use locks, is incorrect. Also notice that the other two results that do use locks take longer—but both return the correct result. There is some overhead in using locks but getting the correct result is worth it.

Author's Note: The Interlocked class also provides Increment and Decrement methods, as well as a couple of other methods that switch two values.

In the coming years, computers will contain more and more CPUs. The operating system and some fancier compilers will probably take advantage of that additional power to give you better performance even if you do nothing, but to get the full benefit of this extra horsepower you'll need to change the way you program.

The Task Parallel Library gives you new tools that let you parallelize some parts of applications relatively easily. Methods such as Parallel.Invoke, Parallel.For, and Parallel.ForEach, and classes such as Task and Future let you launch swarms of threads on different CPUs relatively easily.

The library automatically balances the load across the CPUs available. If you have an old-fashioned single CPU system, you pay only a slight performance penalty. If you have a multi-core system, you'll get the benefits of multiple CPUs with very little extra effort.

Despite the advantages, the underlying problems of implementing multiple threads haven't gone away. You will need to consider race conditions, deadlocks, non-thread-safe classes, and other issues that can arise when multiple threads execute at the same time. If you keep tasks separate you shouldn't have too much trouble getting started with TPL.

More importantly, if you start thinking about and implementing parallelism in your code now, your programs will easily scale to quad-core, eight-core, and even systems with more CPUs. Some day soon, you'll plop a 32-core system on your desk and your programs will have every CPU blazing away at full capacity. But if you ignore parallelism, your customers will be sitting there with 31 idle CPUs (hopefully powered down) wondering why they're not getting better performance from the latest hot hardware.

Editor's Note: Be sure to check back soon to read the follow-up article, which shows the kinds of improvements you can expect after applying parallelism to existing code.

Rod Stephens is a consultant and author who has written more than a dozen books and two hundred magazine articles, mostly about Visual Basic. During his career he has worked on an eclectic assortment of applications for repair dispatch, fuel tax tracking, professional football training, wastewater treatment, geographic mapping, and ticket sales. His VB Helper web site receives more than 7 million hits per month and provides three newsletters and thousands of tips, tricks, and examples for Visual Basic programmers.
Email AuthorEmail Author
Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date