Getting Started with the .NET Task Parallel Library

Getting Started with the .NET Task Parallel Library

arallel programming has been around for decades, but unless you had access to special-purpose hardware, you’ve probably written mostly single CPU applications. You could distribute truly intensive applications across a network?but doing so was a lot of work and involved a lot of overhead.

For several years, programming tools have allowed you to use multiple threads in the same program. Multiple threads may be able to run on multiple CPUs within your computer so multi-threading sometimes provides nice speed improvements. Unfortunately, while multi-threading requires less overhead than distributing applications across a network, it’s still fairly tricky.

Microsoft’s new Task Parallel Library (TPL) provides a new approach for using multiple threads. It provides a set of relatively simple method calls that let you launch a group of routines all at once.

This article provides an introduction to TPL. It explains the main pieces of TPL and provides simples examples. A follow-up article will describe performance effects resulting from converting some of the programs I’ve written for other articles to use TPL.

Before you learn more about TPL, however, it’s worth a few minutes to look at the future of computer hardware, so you’ll know why you should care.

Multi-Core Background
For more than 40 years, the number of transistors that could fit on a chip has roughly doubled every two years. Along with that doubling has come increased CPU speed, which was dubbed “Moore’s Law” after Intel cofounder Gordon Moore explained the trend in his famous 1965 paper.

Unfortunately, the techniques used to maintain this frenetic pace are starting to wear thin?literally too thin. Many past density increases depended on lithographic techniques that produce smaller and smaller chip components. The lithographic technique is much like using a slide projector to shine the image of the chip you want on a silicon wafer. The ultimate limit to how small you can make a feature using those techniques depends on the wavelength of the light you are using. Chip manufacturers are currently working with deep ultra-violet light, electron beams, and x-rays, which squeeze lithographic techniques to their limits, but the era of easy speed gains is rapidly approaching its end.

Future operating systems will probably gain some benefit from multiple CPUs and improved compilers may be able to find opportunities for parallelism for you automatically.

As the big performance gains from lithographic techniques come to an end in the next decade or so, researchers are turning to other methods for squeezing extra performance out of a computer. Some of these, such as optical computing and quantum computing, use radically different approaches to building computers. These approaches show great promise but are unlikely to produce practical commercial results for many years.

A more immediately practical approach is to use more CPUs at the same time. If you can’t make a CPU twice as fast, perhaps you can divide your work across two CPUs. In fact, many new computers these days have multiple cores?multiple CPUs on a single chip. Dual and even quad core systems are common and chips with even more cores are in the works. Vendors have been working on 8-, 16-, and even 32-core systems.

IBM’s Cell architecture allows for a whole host of relatively small CPUs scattered throughout your environment in electronic products such as computers, cell phones, televisions, game consoles, and just about anything else you can think of. In a few years, you may have dozens or even hundreds of small CPUs talking to each other through an ad hoc network in your living room.

Just having multiple CPUs sitting around will probably give you some performance improvements. Future operating systems will probably gain some benefit from multiple CPUs and improved compilers may be able to find opportunities for parallelism for you automatically.

However, there’s a limit to how much these tools can do for you transparently. An eight-core system may potentially have eight times as much CPU power, but if you don’t pitch in and help you’re likely to see only a small improvement in your application’s performance.

To really get the most out of your hardware, you’re going to have to start incorporating parallelism into your applications. Toward that end, TPL gives you a new set of tools that will let you get started quickly and relatively easily.

Introducing the Task Parallel Library
Techniques for distributing tasks across multiple computers have been around for decades. Unfortunately those techniques have mostly been cumbersome, requiring lots of skill and patience to code correctly.

They also had a fair amount of overhead, so a net performance gain occurred only if the pieces of your problem were relatively large. If you divide an easy task into two trivial subtasks, the overhead of calling to remote computers to calculate the subtasks and then reassembling the results often took longer than just doing the work on a single CPU. (If you’re a parent, you probably know that it can take longer to make your kids clean up their rooms than it would to clean them yourself. In that case, however, there’s a principle involved.)

In contrast, the multi-core computers that are becoming ubiquitous require much less overhead to make and return calls between cores, which makes multi-core programming far more attractive. And the Task Parallel Library simplifies the process by providing routines that can automatically distribute tasks across the computer’s available CPUs quickly and painlessly.

The TPL’s methods have relatively little overhead, so you don’t pay a huge penalty for splitting up your tasks. If your computer has a single CPU, the library pays a small penalty for breaking the tasks up and running them one at a time. If you have a dual-core system (like I do), TPL spreads the tasks across the two CPUs. If you have a quad-core system, TPL spreads the tasks across the four CPUs. In the future, if you have a network of 19 Cell CPUs scattered around your family room and kitchen, TPL will spread the tasks across those 19 CPUs.

At least that’s the theory. This scenario is still a bit down the road so don’t be surprised if the details change before it becomes a reality. Getting familiar with using TPL now, however, will help you with any future developments in parallel programming later.

One of TPL’s additional advantages is that it automatically balances the load across the available CPUs. For example, suppose you run four tasks without TPL and you assign two to run on one CPU and two to run on a second. If the first two tasks take longer than the second two tasks, the second CPU finishes its work early and then sits there uselessly while your program waits for the first CPU to finish.

TPL automatically prevents that sort of unbalanced scheduling by running tasks on whatever CPU is available for work. In this example, your program uses TPL to start the four tasks but doesn’t decide where they execute. TPL runs one task on each CPU. When a CPU finishes its current task, TPL gives it another one. The process continues until all of the tasks complete.

If some task is particularly long, other tasks will run on other CPUs. If some tasks are short, one CPU may run many of them very quickly. In any case, the TPL balances the workload, helping to ensure that none of the CPUs sit around idly while there is work to do.

Using TPL
Enough background and theory; it’s time for some details.

TPL is part of the System.Threading namespace. To use it, first download and install the latest version of the Microsoft Parallel Extensions to .NET Framework 3.5 (the June 2008 Community Technology Preview as of this writing).

Author’s Note: To make sure you get the latest version of the library, check the Parallel Computing Developer Center before you download the library.

After you’ve installed the TPL, start a new Visual Basic or C# project and add a reference to System.Threading. Open the Project menu, select Add Reference, select the .NET tab, and double-click the System.Threading entry.

To make working with the namespace easier, you can add an Imports statement in Visual Basic or a using statement in C# to your code module.

Now you’re ready to work with TPL. The following sections describe some of the most useful TPL methods: Parallel.Invoke, Parallel.For, and Parallel.ForEach. They also describe two useful parallel objects, Tasks and Futures, and some simple locking mechanisms provided by the System.Threading namespace.

The Parallel class provides static methods for performing parallel operations. The Parallel.Invoke method takes as parameters a set of System.Action objects that tell the method what tasks to perform. Each action is basically the address of a method to run. Parallel.Invoke launches the actions in parallel, distributing them across your CPUs.

The type System.Action is just a named delegate representing a subroutine that takes no parameters and doesn’t return anything?in other words, you can simply pass in the addresses of the subroutines that you want to run.

The Parallel.Invoke routine takes a parameter array (ParamArray in Visual Basic) so you can simply list as many subroutines as you like in the call.

The following code shows how you can use Parallel.Invoke in Visual Basic:

   Parallel.Invoke(AddressOf Task1, AddressOf Task2, AddressOf Task3)

What could be easier than that? Actually, the following C# version is syntactically a little shorter:

   Parallel.Invoke(Task1, Task2, Task3);

That’s all you need to do! And you thought using multiple CPUs was going to be a big chore!

Naturally there are a few details that you need to consider, such as:

  • If two threads try to access the same variables at the same time, they can interfere with each other.
  • If two threads try to lock several resources at the same time, they may form a deadlock where neither can make progress because each is waiting for a resource locked by the other.
  • Threads running in parallel cannot directly access the user interface thread, so, (among other considerations), they cannot safely use the properties and methods provided by your program’s controls.

Many classes are not “thread-safe,” so you cannot safely use their properties and methods from multiple threads.

For now, ignore these issues and just admire the elegant simplicity of Parallel.Invoke. As long as the tasks mind their own business, it’s amazingly easy.

The Parallel.Invoke routine can also take an array of actions as a parameter. That’s useful if you don’t know at design time exactly which actions you might need to execute. You can fill an array with the tasks at run time and then execute them all. The following Visual Basic code demonstrates that approach:

   Dim actions() As Action = _       {AddressOf Task1, AddressOf Task2, AddressOf Task3}   Parallel.Invoke(actions)

Here’s the C# version:

   System.Action[] actions =       new System.Action[] { Task1, Task2, Task3 };   Parallel.Invoke(actions);

The sample program InvokeTasks, which is available with the downloadable code for this article in Visual Basic and C# versions, uses Parallel.Invoke to run three tasks in parallel. In that program, the three tasks simply use Thread.Sleep to sleep for 1, 2, and 3 seconds, respectively. As expected, when the program calls these three routines sequentially (one after another), it takes six seconds to execute. It’s a simple program, and if you have a multi-core machine, it provides a simple way to determine whether your code is using more than one core, because when the program uses Parallel.Invoke, it takes only three seconds on my dual-core system.

While the application isn’t particularly impressive, the results are. The program can access both my CPUs with very simple code.

Parallel.For and Parallel.ForEach
The InvokeTasks program uses routines that just sleep partly to keep things simple and partly because I had trouble thinking of a good example that needed to execute a small fixed number of tasks in parallel. Usually, I want to perform either a single calculation or break a big problem into dozens of sub-problems and perform them all at once.

For example, suppose I want to generate Mandelbrot sets. It would be nice to break the image into a bunch of pieces and generate them all simultaneously. (I’ve done this as a distributed application before, farming out pieces to different computers on a network, and it’s a relatively large amount of work.)

You could build an array of System.Action objects representing a set of subroutine calls and then call Parallel.Invoke, but that would be awkward at best. One particularly tricky problem would be figuring out which task should generate which part of the image. The call to Parallel.Invoke does not allow you to pass parameters to the tasks so there’s no easy way to tell them which part of the image to build. You could make two separate routines to generate the left and right halves of the image but then you’d have to write a lot of duplicated code. That solution would also be hard to generalize to a four or eight-core system.

A better solution would be to use a single routine and pass in a parameter telling it what part of the image to build. While Parallel.Invoke won’t let you do that, TPL has other methods that will. The Parallel.For method essentially performs a parallel For loop. It takes as parameters an inclusive lower and an exclusive upper bound for the loop, and the address of a subroutine to execute. It calls the subroutine on different threads, passing it the looping value.

For example, the following Visual Basic code calls the Delay subroutine in parallel. The loop passes values 1 through 3 (it stops before the upper bound of 4) and calls Delay, passing it the looping values 1, 2, and 3:

   Parallel.For(1, 4, AddressOf Delay)

The following code shows the C# version:

   Parallel.For(1, 4, Delay);

In other words, the Parallel.For routine lets you call a subroutine, passing it a series of integer values. That’s just the ticket for building a Mandelbrot set in parallel. The call to Parallel.For can loop through the image’s X coordinate values, calling a subroutine that generates the part of the image for that vertical stripe through the image. You’ll see more about this program in the follow-up case studies article.

You’ve seen one way to use the Parallel.For method: calling a subroutine while passing it a series of numbers. But suppose you don’t want to pass it a series of numbers? Suppose you want to pass it a group of numbers that are not in a simple sequence? Or suppose you want to pass it a series of strings, structures, or other non-numeric objects?

Parallel.ForEach handles those situations. It takes as parameters an IEnumerable collection of values and a subroutine. It then calls the subroutine in parallel, passing it the values in the collection.

As an example, the following VB code uses Parallel.ForEach to pass the Delay subroutine the values 2, 3, 1:

   Dim secs() As Integer = {2, 3, 1}   Parallel.ForEach(secs, AddressOf Delay)

The following code shows the C# version:

   int[] secs = {2, 3, 1};   Parallel.ForEach(secs, Delay);

You can easily pass objects to the DelayWithData method instead of simple integers. In the following example, the DelayData class has an integer property called NumSeconds, used to determine the delay duration. This code provides the same effect as the previous examples?but uses objects. Here’s the Visual Basic version:

   Dim delays() As DelayData = { _       New DelayData With {.NumSeconds = 2}, _       New DelayData With {.NumSeconds = 3}, _       New DelayData With {.NumSeconds = 1} _   }   Parallel.ForEach(delays, AddressOf DelayWithData)

Here’s the C# equivalent:

   DelayData[] delays = {       new DelayData{NumSeconds = 2},       new DelayData{NumSeconds = 3},       new DelayData{NumSeconds = 1}   };   Parallel.ForEach(delays, DelayWithData);

The sample program RepeatedTasks demonstrates the Parallel.For and Parallel.ForEach methods. Whether the program uses Parallel.For or Parallel.ForEach, it takes almost exactly half as long on my dual-core system as it does when it executes its tasks sequentially.

Here’s a more interesting demonstration of Parallel.ForEach. The program example called ForEachObjects calls the FilterImage subroutine four times, passing it objects of the PicInfo class. That class has two properties that contain input and output bitmaps. The FilterImage subroutine applies an embossing filter to the input image and saves the result in the output image. After the call to Parallel.ForEach finishes, the main program displays the results (see Figure 1).

Figure 1. Fascinating Filters: Program ForEachObjects uses Parallel.ForEach to create four embossed images in parallel.

The following code shows how the program uses Parallel.ForEach.

   Dim pic_infos() As PicInfo = { _       New PicInfo(picOld1.Image, picNew1.Image), _       New PicInfo(picOld2.Image, picNew2.Image), _       New PicInfo(picOld3.Image, picNew3.Image), _       New PicInfo(picOld4.Image, picNew4.Image) _   }   Parallel.ForEach(pic_infos, AddressOf FilterImage)

If you look closely at Figure 1, you’ll see that applying the filter to the images sequentially took about 2.63 seconds, but only about 1.60 seconds in parallel. This program has more overhead than the previous more trivial programs so the parallel calculation takes more than half as long as the sequential version; but the parallel version still saves about 40 percent of the total time.

Controlling Tasks
The Parallel.Invoke method lets you call a series of possibly different subroutines in parallel. The Parallel.For and Parallel.ForEach methods let you call the same subroutine many times, passing it either a sequence of numbers or a series of values in an IEnumerable. After the calls begin, all these methods?Parallel.Invoke, Parallel.For, and Parallel.ForEach?block your program until all the calls complete; then they return control to your program.

Occasionally you may want greater control over how the tasks execute. In particular, you may want to be able to cancel tasks. To give you that control, TPL provides the System.Threading.Tasks.Task class.

A Task object represents something that you want to do. You create a Task by calling the Task class’s static Create method, passing it the address of the routine that you want it to execute.

After you create a Task, the program is free to execute the Task at any time. If there’s a spare CPU sitting around with nothing to do, the Task will start executing right away.

The following code shows how example program TimedTasks prepares three tasks for execution. It makes an array of three Task objects that represent calls to the Task0, Task1, and Task2 subroutines.

   Private m_Tasks(2) As Tasks.Task   ...   ' Prepare the tasks.   m_Tasks(0) = Task.Create(AddressOf Task0)   m_Tasks(1) = Task.Create(AddressOf Task1)   m_Tasks(2) = Task.Create(AddressOf Task2)
Author’s Note: Technically, the subroutines Task0, Task1, and Task2 should take a single object as a parameter. While VB doesn’t mind if the routines take no parameters, C# insists that they have the proper signature.

If the routines you’re calling require a parameter, you can pass it as a second argument to Task.Create.

The following code creates an array of TaskData objects and then makes three tasks, passing each a different entry from the array.

   ' Prepare the task data.   Private m_ElapsedTime(2) As TimeSpan   Dim task_data() As TaskData = { _       New TaskData() With {.TaskNum = 0, .NumSeconds = 2}, _       New TaskData() With {.TaskNum = 1, .NumSeconds = 1}, _       New TaskData() With {.TaskNum = 2, .NumSeconds = 3} _   }      ' Prepare the tasks.   m_Tasks(0) = Task.Create(AddressOf ParamTask, task_data(0))   m_Tasks(1) = Task.Create(AddressOf ParamTask, task_data(1))   m_Tasks(2) = Task.Create(AddressOf ParamTask, task_data(2))

The Task class provides several other methods to manage tasks in addition to Create:

  • The WaitAny method makes the program wait until any one of a set of tasks has finished.
  • The WaitAll method makes the program wait until all the tasks in a set are finished. WaitAll throws an error if any of the tasks is canceled.
  • The Task class’s Cancel method cancels the task. This method doesn’t actually stop the task; it simply sets its IsCanceled property to True. It is up to your task to check this value periodically to see if it should stop working.

Of course that begs the question, “How does the task know which Task is running it?” Example program TimedTasks solves this problem by making an array of Task objects and passing each task its index in the array.

Here’s a more complete picture of how TimedTasks works. The following code allocates the Task array. It also creates an array of TimeSpan objects where the tasks can record their elapsed times when they finish. Giving each task a separate entry in the array lets them work independently, so they don’t need to worry about interfering with each other.

   ' Run separate tasks.   Private m_Tasks(2) As Task   Private m_ElapsedTime(2) As TimeSpan

Next, the main program starts the tasks:

   ' Prepare the task data.   Dim task_data() As TaskData = { _       New TaskData() With {.TaskNum = 0, .NumSeconds = 2}, _       New TaskData() With {.TaskNum = 1, .NumSeconds = 1}, _       New TaskData() With {.TaskNum = 2, .NumSeconds = 3} _   }      ' Prepare the tasks.   m_Tasks(0) = Task.Create(AddressOf ParamTask, task_data(0))   m_Tasks(1) = Task.Create(AddressOf ParamTask, task_data(1))   m_Tasks(2) = Task.Create(AddressOf ParamTask, task_data(2))      ' Wait for any single task to finish (up to 10 seconds).   Task.WaitAny(m_Tasks, 10000)      ' Cancel all the remaining tasks.   For i As Integer = 0 To m_Tasks.Length - 1       If (Not m_Tasks(i).IsCanceled) AndAlso _          (Not m_Tasks(i).IsCompleted) _       Then           m_Tasks(i).Cancel()       End If   Next i      ' Wait for all tasks to finish.   Try       Task.WaitAll(m_Tasks)   Catch ex As Exception       lstResults.Items.Add("Exception: " & ex.Message)       lstResults.Items.Add("    " & ex.InnerException.Message)   End Try      ' Display each tasks's results.   For i As Integer = 0 To m_Tasks.Length - 1       Dim txt As String = "Task " & i.ToString()       If m_Tasks(i).IsCanceled Then           txt &= " was canceled, "       ElseIf m_Tasks(i).IsCompleted Then           txt &= " completed, "       Else           txt &= " is still running, "       End If       txt &= task_data(i).ElapsedTime.TotalSeconds.ToString("0.00")       lstResults.Items.Add(txt)   Next i

This code creates an array of TaskData objects. An object’s TaskNum property indicates the task’s index in the m_Tasks array and the NumSeconds property tells how long the task should sleep.

The program then creates the Task objects, passing them their corresponding TaskData objects as parameters.

Next the program uses WaitAny to wait until one of the tasks finishes. The second parameter (10000) specifies the maximum number of milliseconds the program should wait.

In this example, where the tasks wait for 2, 1, and 3 seconds respectively, the second task finishes first after about 1 second.

Author’s Note: In general, you should not assume that tasks will run in any particular order. On a dual-core machine, the program could run the two- and three-second tasks first and run the one-second task only after one of the others finished. On a quad-core machine, all three tasks would run at the same time.

When the first task finishes, the program loops through the m_Tasks array and calls the Cancel method for any tasks that are still running.

The program then uses WaitAll to wait until all the tasks finish. It uses a Try-Catch block to protect itself against the exception that WaitAll throws because a task was canceled.

The code finishes by displaying each task’s status (canceled or finished, there should be no tasks still running) and its elapsed time.

The following code shows how the tasks work:

Private Sub ParamTask(ByVal task_data As TaskData)       Dim start_time As Date = Now          For i As Integer = 1 To task_data.NumSeconds * 10           ' Do something time consuming.           Thread.Sleep(100)              ' See if we have been canceled.           If m_Tasks(task_data.TaskNum).IsCanceled Then Exit For       Next i
Figure 2. Terrific Tasks: The Task class lets you control, wait for, and cancel tasks.
       Dim stop_time As Date = Now       task_data.ElapsedTime = stop_time - start_time          Debug.WriteLine("Task " &           task_data.TaskNum & " finished")   End Sub

The ParamTask method records its start time and then enters a loop. For each second that the TaskData parameter indicates it should wait, the task sleeps for one second. It sleeps in little 100-millisecond naps so it can respond promptly if it is canceled.

Within the loop, the task checks its m_Tasks entry to see if it has been canceled. It uses the index in the TaskData object’s TaskNum property to determine which m_Tasks entry to use. If the Task object’s IsCanceled property is True, the task breaks out of its For loop.

The task finishes by saving its elapsed time in its TaskData object.

Figure 2 shows the output of program TimedTasks. The output in the list box shows that Task 1 (the 1-second task) completed after 1 second. Tasks 0 and 2 (the 2- and 3-second tasks) were canceled after 1.1 seconds.

Author’s Note: All the sample programs are available in both VB.NET and C# versions.

Betting on Futures
In TPL terms, a Future is basically a task that returns a value. It’s like a deferred function. You start it running and then use its value later. If the Future hasn’t finished calculating its value by the time you need it, it makes you wait while it finishes.

For example, consider the Fibonacci numbers. The first two values are defined as Fib(0) = 0 and Fib(1) = 1. For larger values of N, Fib(N) = Fib(N?1) + Fib(N?2).

The following code shows a function that calculates Fib(N):

   Private Function Fibo(ByVal N As Long) As Long       If N <= 0 Then Return 0       If N = 1 Then Return 1          Return Fibo(N - 1) + Fibo(N - 2)   End Function

There are much more efficient non-recursive implementations of this function, but this version is a good candidate for parallelism because in sequential mode, it takes a long time to calculate Fib(N) when N is larger than 20 or 30.

The following code performs a similar calculation but uses Futures to calculate the result for each recursive call:

   Private Function FiboFullParallel(ByVal N As Long) As Long       If N <= 0 Then Return 0       If N = 1 Then Return 1          Dim t1 As Tasks.Future(Of Long) = _           Tasks.Future(Of Long).Create( _               Function() FiboFullParallel(N - 1))       Dim t2 As Tasks.Future(Of Long) = _           Tasks.Future(Of Long).Create(_               Function() FiboFullParallel(N - 2))          Return t1.Value + t2.Value   End Function

This function checks the base cases N = 0 and N = 1 as before. It then creates two Future objects. The parameter to the Create method is a lambda (aka inline) function that returns the result of the FiboFullParallel function evaluated for N?1 and N?2. As each Future is created, it may begin executing on any of the system's CPUs.

The function finishes by adding the values of the two Futures and returning the result. If the Futures have finished calculating their values by the time the Return statement executes, all is well. If either Future is not finished, the program waits until the executing Future completes its calculation.

While this version works, it is extremely slow and inefficient. The problem is that calculating FiboFullParallel(N) requires an enormous number of recursive calls to FiboFullParallel.

For example, to calculate FiboFullParallel(10) the program must find FiboFullParallel(9) and FiboFullParallel(8). But to calculate FiboFullParallel(9), the program calculates FiboFullParallel(8) and FiboFullParallel(7); in other words, it has to calculate FiboFullParallel(8) twice.

If you follow the calculation further, you'll find that the program must calculate each of the intermediate values a huge number of times. For example, to calculate FiboFullParallel(10), the function calls itself 176 times. To calculate FiboFullParallel(20), the function calls itself 21,890 times. The number of calls grows very quickly as N grows.

Here's an improved version:

   Private Function FiboSplit(ByVal N As Long) As Long       If N <= 0 Then Return 0       If N = 1 Then Return 1          Dim t1 As Tasks.Future(Of Long) = _           Tasks.Future(Of Long).Create(Function() Fibo(N - 1))       Dim t2 As Tasks.Future(Of Long) = _           Tasks.Future(Of Long).Create(Function() Fibo(N - 2))          Return t1.Value + t2.Value   End Function

This function builds two Future objects but instead of calling back to this function, these Futures invoke the original sequential function Fibo.

Figure 3. Fast Futures: Program UseFutures uses Future objects to calculate Fibonacci numbers in parallel.

For example, when the main program calls FiboSplit(10), the Futures calculate Fibo(9) and Fibo(8). After that, function Fibo does not use Futures so it doesn't pay the overhead that they incur. In this version, the program splits the calculation into two Futures and after that the Futures execute sequentially.

The program UseFutures demonstrates these functions (see Figure 3). The Result label shows the results for the 37th Fibonacci number. The Sequential label shows that the original Fibo function took 1.88 seconds while the Split label shows that the FiboSplit function took only 1.20 seconds. The program won't calculate FiboFullParallel for N greater than 25 because it takes too long (about 2.6 seconds for N = 25; probably years for N = 37).

You can use the TPL classes and methods to split tasks into pieces that you can then run in parallel but, as the FiboFullParallel function demonstrates, don't split the problem too finely. If you break the problem into pieces that are each very easy to evaluate, the overhead of running them separately will outweigh the benefit of parallelism.

Earlier I briefly mentioned that you need to be careful to ensure that tasks running on different threads don't interfere with each other. There are many ways two threads can get in each other's way and I don't want to take the time to cover them all here, but I do want to describe a couple of techniques that you may find useful.

One of the easiest ways for two threads to lock horns is if they read and write the same variable. For example, consider the following trivial subroutine. It simply adds an integer value to a total:

   Private m_Total As Integer      Private Sub AddNoLock(ByVal i As Integer)       m_Total += i   End Sub

This code performs its addition in a single step using the += operator, which is so quick and simple that you would think nothing could go wrong. Unfortunately, if two threads execute this same piece of code at the same time, they can conflict.

Suppose two threads (A and B) are running the AddNoLock subroutine with parameters 1 and 2 respectively. Now suppose thread A starts running its addition statement and reads the value of m_Total, which at the time is 10.

Now suppose thread B gets a turn. It reads m_Total, which is still 10, and adds 2 to it, saving the result back in m_Total, which is now 12. Thread B congratulates itself on a job well done and retires.

But thread A is still executing. It adds 1 to the value that it read previously (10) and gets 11, which it stores in m_Total.

The final result is that m_Total holds the value 11 rather than the expected value of 13. If the two threads had run sequentially, the result would be 13. But because they ran at the same time, the result is 11.

This is called a race condition. The two threads are racing to see which can read and write the variable first. Depending on the exact sequence of events, you can get one of several possible outcomes.

Race conditions can be extremely hard to debug because they don't happen every time. With the small window for problems in this example (a single line of very fast code), the odds of a problem are fairly small, so the problem won't appear every time you run the program. Even worse, stepping through the code in the debugger will probably prevent the threads from interrupting each other, so you won't see the problem in the debugger at all.

So what are the odds that two threads will interlace in just the right way to make this sort of problem occur? If you're an experienced programmer, then you know that this will happen with absolute certainty when you're demonstrating your program to your company president. But even when the big boss isn't there, the odds are still large enough to make race conditions a significant problem.

The sample program UseLocks uses the following code to demonstrate this problem:

   Dim num_trials As Integer = Val(txtNumTrials.Text)   Dim start_time As Date = Now   For i As Integer = 1 To num_trials       m_Total = 0       Parallel.For(1, 10001, AddressOf AddNoLock)   Next i   Dim stop_time As Date = Now   Dim elapsed As TimeSpan = stop_time - start_time

The program accepts a user-entered number that controls the number of trials. For each trial, the code calls Parallel.For to invoke the AddNoLock subroutine shown earlier 10,000 times. In other words, if the user enters 100 for the number of trials, the program spins out a million (100 x 10,000) threads. That's plenty large enough to cause a race condition most of the time.

When I run the program for 100 trials, it takes only about 0.75 seconds to perform all of those additions but the result is not always the same. In fact, if I run only 1 trial, so the program starts 10,000 threads, it produces inconsistent results.

There are several ways you can avoid this type of problem. One of the best is to make sure the threads never, ever, ever read and write the same variables. They may read from some shared variables but they should not write into shared variables. If you give the threads separate variables to write into, there should be no problem. The Task example described earlier demonstrated this approach by using separate m_Tasks and m_ElapsedTime array entries for each task.

Another way to handle this problem is to use a lock. A lock reserves some resource for a particular thread and denies other threads access to the resource while it is locked.

There are a couple of easy ways you can make locks in Visual Studio. In Visual Basic, you can use the SyncLock statement. You pass this command an object used as a token to prevent other threads from acquiring a similar lock.

The following code, which is used by sample program UseLocks, demonstrates this approach. The AddSyncLock function uses a SyncLock statement to reserve access to the "Me" object (the object running this code?in this case, the form). It then adds its value to the total and releases the lock. If another thread tries to execute the function, it will block at the SyncLock statement until the first thread releases its lock:

   Private Sub AddSyncLock(ByVal i As Integer)       SyncLock Me           m_Total += i       End SyncLock   End Sub

Here's the equivalent C# version:

   private void AddSyncLock(int i)   {       lock(this)       {           m_Total += i;       }   }
Figure 4. Lots of Locks: The SyncLock statement and the Interlocked class allow threads to use the same variable without interference.

Operations such as this that add one value to another are so common in complex multi-threaded applications that the Threading namespace provides an Interlocked class to make them easier. This class's static Add method creates an exclusive lock, adds a number to another number, and then releases the lock.

Example program UseLocks uses the following code to demonstrate Interlocked.Add:

   Private Sub AddInterlocked(ByVal i As Integer)       Threading.Interlocked.Add(m_Total, i)   End Sub

Figure 4 shows program UseLocks in action. The first result, which doesn't use locks, is incorrect. Also notice that the other two results that do use locks take longer?but both return the correct result. There is some overhead in using locks but getting the correct result is worth it.

Author's Note: The Interlocked class also provides Increment and Decrement methods, as well as a couple of other methods that switch two values.

In the coming years, computers will contain more and more CPUs. The operating system and some fancier compilers will probably take advantage of that additional power to give you better performance even if you do nothing, but to get the full benefit of this extra horsepower you'll need to change the way you program.

The Task Parallel Library gives you new tools that let you parallelize some parts of applications relatively easily. Methods such as Parallel.Invoke, Parallel.For, and Parallel.ForEach, and classes such as Task and Future let you launch swarms of threads on different CPUs relatively easily.

The library automatically balances the load across the CPUs available. If you have an old-fashioned single CPU system, you pay only a slight performance penalty. If you have a multi-core system, you'll get the benefits of multiple CPUs with very little extra effort.

Despite the advantages, the underlying problems of implementing multiple threads haven't gone away. You will need to consider race conditions, deadlocks, non-thread-safe classes, and other issues that can arise when multiple threads execute at the same time. If you keep tasks separate you shouldn't have too much trouble getting started with TPL.

More importantly, if you start thinking about and implementing parallelism in your code now, your programs will easily scale to quad-core, eight-core, and even systems with more CPUs. Some day soon, you'll plop a 32-core system on your desk and your programs will have every CPU blazing away at full capacity. But if you ignore parallelism, your customers will be sitting there with 31 idle CPUs (hopefully powered down) wondering why they're not getting better performance from the latest hot hardware.

Editor's Note: Be sure to check back soon to read the follow-up article, which shows the kinds of improvements you can expect after applying parallelism to existing code.


About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist