Parallel and Concurrency Futures for Microsoft Developers

arallel computing and concurrent programming are rapidly becoming mainstream topics for discussion in the corporate world. These are not new ideas; in fact they’ve been around for more than 30 years. However, like many long-running computer science concepts, they’re only now becoming relevant to mainstream business developers due to changes in both hardware and in the overall computing environment.

The truth is, the raw speed of a single CPU, or core, has remained flat for several years. Instead of making cores faster, the hardware manufacturers are increasing overall computing power by adding more cores to each computer. Therefore, the only way your application can improve performance is by exploiting the benefits of multi-core computers—through parallelism.

Like nearly all software vendors, Microsoft has relied on the rapid increase in processing power to enable new features in their products. Over the years, Windows, Office and Microsoft’s other applications, tools and technologies have grown more powerful, but also evolved to consume more computing resources. They are now in a position where they must enable parallel computing to leverage future increases. As the large software vendors enable parallelism in their own products, development tools and platforms, developers will benefit from their work, because the new features to leverage parallelism will become available to them as well.

Of course, the sudden end of ever-increasing single-core horsepower could simply plateau the development of new features—but that doesn’t seem to be happening. Instead, there is evidence that the user experience will continue to evolve in ways that require more computing power. Technologies such as WPF and Silverlight highlight the desire for a more polished graphical experience, often including rich animations. Add in the increased use of voice recognition, video, audio, location features, motion sensors, light sensors, and ever-increasing storage density, and it becomes clear that the future still holds rapidly increasing requirements for computing power.

Users’ expectations of application functionality also continue to increase. More and more, users expect applications to not only allow editing of data, but also to show related data, results of data analysis, and other information, all updated in real time as the user interacts with the application. So, while basic data entry operations require no more processing than they did 20 years ago, user demand for peripheral data processing features continues to increase at a staggering rate.

With CPU speeds remaining constant, developers need access to parallel computing to meet these increasing requirements and still provide acceptable performance.

Basic Parallelism

The idea of parallel or concurrent programming is pretty simple: a program should be able to do more than one thing at a time. Most programs work sequentially, doing one thing at a time as the computer follows the instructions in your code. Today’s computers are almost all dual core, and over the next few years multi-core computers will become common. A sequential program can really only use one CPU, or one core. To effectively use multiple cores, a program needs to run multiple tasks in parallel. These parallel tasks will run concurrently, on different cores in the computer.

As a developer, you need to focus on this truism: Sequential applications don’t gain any benefit from multiple cores. In other words, until you begin programming parallelism into your applications, their performance will remain flat. Figure 1 illustrates how a sequential application might be scheduled on a single, and then a dual core computer.

To run multiple tasks in parallel, a program must be multi-threaded. The Windows operating system, like most modern operating systems, executes code on a thread. The OS schedules each thread to execute on a computer core. If your application runs code on one thread, it can run on only one core at a time. However, if your application uses multiple threads, the code on those threads can often be scheduled to run on different cores, all at the same time, as shown in Figure 2.

 
Figure 1. Sequential Application: In the left-hand picture, the sequential program runs linearly, while on the right, even though it’s running on multiple cores, it runs on only one at a time, thus its performance doesn’t increase.
 
Figure 2. Multicore-Capable Application: On a single core (left image), a multicore program runs no faster than a sequential application; however, when more cores are available, the application can perform its work in less time (right image).

In Figure 2, notice how the application accomplishes the same amount of work in much less time on the dual-core machine, because some of the work occurs in parallel. The application splits its work into tasks, and executes those tasks on different threads, so they can exploit both CPU cores.

Problems with Parallelism

Unfortunately, it turns out that writing multi-threaded applications is incredibly difficult. Even seemingly simple programming problems can become very complex when you add multithreading to the mix. Problems such as race conditions, deadlocks, and corrupted data in memory are very common, and are often difficult to identify, debug, and solve. Even worse, other problems such as memory contention, effective use of CPU memory caching, operating system context switching, and other low-level computing issues can make parallel applications run slower than sequential applications!

Consider this simple bit of code that increments a value, but never allows it to grow beyond 3:

   if (_myField < 3) _myField++;

The preceding code is perfectly fine in a sequential application—but won't work reliably in a parallel environment where two or more threads might execute the same line. That's because it is quite possible (or even inevitable if you run this enough times) that _myField could be 2 when the threads perform the if test, so the simultaneously-executing threads all see that the value is less than 3; therefore, all threads increment the value, resulting in _myField being 3, 4 or more. In fact, the end result is nondeterministic, and varies based on how the operating system schedules the threads. Often, this code might appear to work perfectly during development and testing, and may fail only sporadically after it's deployed in production.

There are numerous solutions to the problem, including the use of locking strategies, avoiding the use of shared data, and data structures designed for safe multi-threading. The real challenge is in retraining business developers to recognize all the possible ways they can get into trouble with multi-threaded code, to know all the possible solutions, and to recognize and choose the solution that offers both safety and performance. And let's face it, even people who specialize in multi-threaded coding have a hard time identifying issues in any sizable application.

In other words, parallel programming is very hard to get right; there are no silver bullets that just "make it work." Experts in computer science have been working on these issues for decades. While the problem domain is well understood, the solutions are not easy. The challenge is to develop tools, components, and frameworks that wrap the complexity as much as possible, to make parallel computing reasonably accessible to mainstream business developers.

Microsoft's Strategy

Microsoft and other vendors have been working on making parallel computing more accessible over the past several years. As far back as 1996, with the introduction of Microsoft Transaction Server (MTS) and later COM+, Microsoft has provided developers with multi-threaded environments that abstract the complexity. In both MTS and COM+, user code runs in a single-threaded environment (called an apartment), but the computer can run many of these apartments concurrently. ASP.NET uses a similar model when hosting web pages, web services, and WCF (Windows Communication Foundation) services.

Even on a single core machine, developers could often improve the perception of performance, by using separate threads to perform different tasks. For example, you could prevent an application's user interface from becoming non-responsive by performing long processor-intensive tasks on a separate thread, leaving the original thread to handle the UI. Developers have had the ability to write threaded code for quite some time. The Microsoft .NET 1.0 framework included a thread pool (System.Threading.ThreadPool), which provides basic task scheduling capabilities for any .NET application.

In .NET 2.0, Microsoft introduced the BackgroundWorker component (enhanced in .NET 3.0 and 3.5), which abstracts some of the complexity of running tasks on a background thread, while still allowing safe interaction with the Windows Forms or WPF UI thread. More recently, WPF provides the IsAsynchronous property on many data provider controls, which allows even non-programmers (such as XAML graphic designers) to perform data retrieval and processing on a background thread.

And of course, Windows itself is multi-threaded. Native Windows applications in C++ have always had access to low-level threading and locking constructs. The .NET Framework now exposes many of these same low-level constructs as well. While interesting to hard-core multi-threading developers, these low-level constructs are not really designed for business developers, nor are they optimized for many of the application scenarios business developers face today.

Looking to the future, Microsoft is investing in a number of important initiatives. Table 1 lists the features coming in Visual Studio 2010 and the Microsoft .NET 4.0 framework.

Table 1. Concurrent Programming Initiatives: The table lists upcoming features for VS 2010 and the .NET framework 4.0.
Initiative Description
Parallel Extensions to the .NET Framework Additions to the base class library in .NET to provide support for high level parallel concepts to any .NET application, including other items in this table such as TPL, Parallel LINQ (PLINQ), and Coordination Data Structures.
Task Parallel Library (TPL) A library that makes it easier for a managed .NET application to define tasks, and run those tasks in parallel. While parallelism is possible today, the features in this library make it much more approachable.
Parallel LINQ Typically called PLINQ, this is an enhancement to LINQ to Objects so that LINQ queries run as a set of parallel tasks.
Coordination Data Structures A set of types designed to enable efficient concurrency patterns; including specialized locks, coordination objects, and collections.
.NET Thread Pool A significant update to the pre-existing .NET Thread Pool means .NET 4.0 provides more efficient execution of parallel workloads on multi-core machines. These are behind-the-scenes nhancements, so existing code using the thread pool gets these benefits automatically.
Parallel Pattern Library (PPL) A library that makes it easier for a native Windows application to define tasks, and run those tasks in parallel. The library includes a set of high level constructs such as a parallel for loop, and numerous low level task, threading, and synchronization constructs.
Concurrency Runtime A layer of native Windows services that support task scheduling and execution for an application.
Resource Management A low-level native Windows service layer that manages task scheduling at a per-process level.
Parallel Debugging New tool windows in Visual Studio 2010 to help developers debug parallel applications.
Parallel Application Profiling New views in Visual Studio 2010 that help developers profile and analyze how a parallel application executes.

All these new parallelism features raise the question of how developers might typically use them, which is the subject of the next section.

Typical Uses

A typical .NET business developer will mostly likely use Parallel LINQ, and might use some of the Coordination Data Structures. And of course, any code that uses the .NET thread pool will automatically gain the benefit of the major .NET 4.0 thread pool enhancements (you'll see more about that later in this article).

A typical native Windows business developer might also use various features of the Parallel Pattern Library (PPL), including new algorithms, primitives and types provided by that library. Native developers might also directly interact with the task scheduler in the Concurrency Runtime as they schedule and execute tasks within their application.

However, the majority of the new features in Visual Studio 2010 are designed to support framework designers, enabling them to create components that support parallel computing. Only a couple of the new features are intended for direct use by developers building business applications. In other words, the features in Visual Studio 2010 are mostly about laying the low-level groundwork needed to build higher level features in the future.

The long-term goal is to allow the creation of frameworks and components that provide parallelism and concurrent behaviors automatically, so that typical business developers don't have to deal with the complexities and potential pitfalls inherent in parallel computing. To accomplish this goal, it is important to have a solid base on which to build these components and frameworks.

At the lowest level, this means efficiently managing how work gets scheduled across cores, which implies a component that can coordinate work at the Windows process level, or perhaps even across all processes on a computer. The resource management service layer provides this capability at the Windows process level, and Microsoft may broaden the scope in future versions of the Windows operating system.

The purpose of the resource manager is to map work to threads, ensuring that there are an appropriate number of threads for the number of cores available to the application, and to assign tasks to those threads efficiently. The resource management service layer exists for both managed and unmanaged code, so there may be two managers for applications that use a mix of native and .NET components.

The resource manager API is pretty low-level, so the Concurrency Runtime includes a task scheduler that provides a higher level of abstraction. Native Windows developers will create instances of the task scheduler, and use them to queue up tasks to be scheduled across multiple threads and cores.

On the managed side, the .NET 4.0 thread pool has been reimplemented to leverage something similar to the Concurrency Runtime task scheduler. This means that the thread pool now relies on a lower-level resource manager to coordinate the use of threads across the entire Windows process, not just a single .NET AppDomain within that process. It also means that the thread pool now understands how to work with tasks as defined by the TPL, as well as traditional queued work item delegates.

While most .NET developers will use PLINQ and some of the high level concepts provided by the TPL, others may use the thread pool directly, as they do today. Either way, their code will gain the benefit of these thread pool enhancements, because the Task Parallel Library (TPL) uses the thread pool, and higher-level features such as Parallel LINQ use the TPL to do their work. The TPL, Parallel LINQ, and other managed features use the Coordination Data Structures, so they all have a consistent set of thread-safe data structures.

One challenge to overcome is that authors of frameworks and components often invent their own thread-safe data structures—and perhaps even their own thread or task management. The result is that different components and frameworks can't work together, or at least can't make efficient use of available thread scheduling or data structures.

While splitting work into parallel tasks is a powerful technique, it can be counterproductive to create more worker threads than the computer has CPU cores, because the operating system can waste a lot of time switching the cores from one thread to another, slicing up time so each thread gets to run for at least a little while. It is far more efficient to have the same number of busy worker threads as the computer has cores, because then each thread can keep running without being switched out for another thread. For scenarios where many threads are blocking on IO or other locks, it is ideal to have more total threads, so there are still roughly as many busy threads as the machine has cores.

If all frameworks and components share a common thread pool or task scheduler and resource manager, then those low-level service layers can coordinate the use of threads across the entire application. That helps avoid the issue of over-saturating the operating system or hardware.

Similarly, sharing thread-safe data structures enables better interoperability. If all frameworks and components use the same safe collections, stacks, queues, and other data structures, then an application can safely pass references to those structures from component to component to share data in memory, regardless of the components or frameworks in use.

As you can see, most of the features that will be introduced in Visual Studio 2010 focus on enabling component and framework developers in both native and managed code to leverage parallelism in a way that is more abstract, performs better, and is more interoperable than in the past. Future components and frameworks from Microsoft and many other vendors, building on this base, will provide more accessible and productive parallelism to business developers. And this is just the beginning.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Related Posts