Problems with Parallelism
Unfortunately, it turns out that writing multi-threaded applications is incredibly difficult. Even seemingly simple programming problems can become very complex when you add multithreading to the mix. Problems such as race conditions, deadlocks, and corrupted data in memory are very common, and are often difficult to identify, debug, and solve. Even worse, other problems such as memory contention, effective use of CPU memory caching, operating system context switching, and other low-level computing issues can make parallel applications run slower than sequential applications!
Consider this simple bit of code that increments a value, but never allows it to grow beyond 3:
if (_myField < 3) _myField++;
The preceding code is perfectly fine in a sequential application—but won't work reliably in a parallel environment where two or more threads might execute the same line. That's because it is quite possible (or even inevitable if you run this enough times) that _myField
could be 2
when the threads perform the if
test, so the simultaneously-executing threads all see that the value is less than 3
; therefore, all threads increment the value, resulting in _myField
or more. In fact, the end result is nondeterministic, and varies based on how the operating system schedules the threads. Often, this code might appear to work perfectly during development and testing, and may fail only sporadically after it's deployed in production.
There are numerous solutions to the problem, including the use of locking strategies
, avoiding the use of shared data, and data structures designed for safe multi-threading. The real challenge is in retraining business developers to recognize all the possible ways they can get into trouble with multi-threaded code, to know all the possible solutions, and to recognize and choose the solution that offers both safety and performance. And let's face it, even people who specialize in multi-threaded coding have a hard time identifying issues in any sizable application.
In other words, parallel programming is very hard to get right; there are no silver bullets that just "make it work." Experts in computer science have been working on these issues for decades. While the problem domain is well understood, the solutions are not easy. The challenge is to develop tools, components, and frameworks that wrap the complexity as much as possible, to make parallel computing reasonably accessible to mainstream business developers.
Microsoft and other vendors have been working on making parallel computing more accessible over the past several years. As far back as 1996, with the introduction of Microsoft Transaction Server (MTS) and later COM+, Microsoft has provided developers with multi-threaded environments that abstract the complexity. In both MTS and COM+, user code runs in a single-threaded environment (called an apartment), but the computer can run many of these apartments concurrently. ASP.NET uses a similar model when hosting web pages, web services, and WCF (Windows Communication Foundation) services.
Even on a single core machine, developers could often improve the perception of performance, by using separate threads to perform different tasks. For example, you could prevent an application's user interface from becoming non-responsive by performing long processor-intensive tasks on a separate thread, leaving the original thread to handle the UI. Developers have had the ability to write threaded code for quite some time. The Microsoft .NET 1.0 framework included a thread pool (System.Threading.ThreadPool), which provides basic task scheduling capabilities for any .NET application.
In .NET 2.0, Microsoft introduced the BackgroundWorker component (enhanced in .NET 3.0 and 3.5), which abstracts some of the complexity of running tasks on a background thread, while still allowing safe interaction with the Windows Forms or WPF UI thread. More recently, WPF provides the IsAsynchronous
property on many data provider controls, which allows even non-programmers (such as XAML graphic designers) to perform data retrieval and processing on a background thread.
And of course, Windows itself is multi-threaded. Native Windows applications in C++ have always had access to low-level threading and locking constructs. The .NET Framework now exposes many of these same low-level constructs as well. While interesting to hard-core multi-threading developers, these low-level constructs are not really designed for business developers, nor are they optimized for many of the application scenarios business developers face today.
Looking to the future, Microsoft is investing in a number of important initiatives. Table 1 lists the features coming in Visual Studio 2010 and the Microsoft .NET 4.0 framework.
Table 1. Concurrent Programming Initiatives: The table lists upcoming features for VS 2010 and the .NET framework 4.0.
|Parallel Extensions to the .NET Framework
||Additions to the base class library in .NET to provide support for high level parallel concepts to any .NET application, including other items in this table such as TPL, Parallel LINQ (PLINQ), and Coordination Data Structures.
|Task Parallel Library (TPL)
||A library that makes it easier for a managed .NET application to define tasks, and run those tasks in parallel. While parallelism is possible today, the features in this library make it much more approachable.
||Typically called PLINQ, this is an enhancement to LINQ to Objects so that LINQ queries run as a set of parallel tasks.
|Coordination Data Structures
||A set of types designed to enable efficient concurrency patterns; including specialized locks, coordination objects, and collections.
|.NET Thread Pool
||A significant update to the pre-existing .NET Thread Pool means .NET 4.0 provides more efficient execution of parallel workloads on multi-core machines. These are behind-the-scenes nhancements, so existing code using the thread pool gets these benefits automatically.
|Parallel Pattern Library (PPL)
||A library that makes it easier for a native Windows application to define tasks, and run those tasks in parallel. The library includes a set of high level constructs such as a parallel for loop, and numerous low level task, threading, and synchronization constructs.
||A layer of native Windows services that support task scheduling and execution for an application.
||A low-level native Windows service layer that manages task scheduling at a per-process level.
||New tool windows in Visual Studio 2010 to help developers debug parallel applications.
|Parallel Application Profiling
||New views in Visual Studio 2010 that help developers profile and analyze how a parallel application executes.
All these new parallelism features raise the question of how developers might typically use them, which is the subject of the next section.