To get the most out of new multi-core processing power, you must embrace parallel programming. But parallel programming comes with its share of challenges. For example, you usually "re-invent the wheel" each time you need widely-used parallel programming utilities. These utilities are difficult to write correctly and efficiently. Data structures are tedious to write, even for experienced developers.
This is why Intel's newest software product is generating significant interest among developers. Intel® Threading Building Blocks (Intel TBB) is a C++ runtime-library that supports scalable data parallel programming. To use the library, you specify tasks not threads, and let the library map tasks efficiently onto threads. This typically means writing less code than other threading models. Intel TBB is also a cross-platform framework that does not require special languages or compilers.
Intel TBB differs from other threading models (like POSIX and Windows threads) in a number of ways:
- Intel TBB enables you to specify tasks instead of threads.
Most threading models make you specify threads and programming directly in terms of threads is tedious and can lead to inefficient programs. Threads are low-level, heavy constructs that are close to the hardware. In contrast, the Intel TBB run-time library automatically schedules tasks onto threads in a way that makes efficient use of processor resources.
 | |
| Figure 1. Side-by-side comparison of equivalent thread functionality shows that less code is needed to achieve parallelism with Intel Threading Building Blocks on a 2D ray tracing program. (Source: Intel) |
- Intel TBB targets threading for performance.
The library focuses on the goal of parallelizing computationally intensive workdelivering higher-level, simpler solutions.
- Intel TBB is fully compatible with other threading packages.
You can mix existing threaded code with Intel TBB. The library offers platform portability on Windows, Linux, and Mac OS, through its cross-platform API. It supports 32-bit and 64-bit applications using Intel, Microsoft, and GNU compilers.
- Intel TBB emphasizes scalable, data parallel programming.
Breaking a program up into separate functional blocks and assigning a separate thread to each block often doesn't scale well because the number of functional blocks is typically fixed. In contrast, Intel TBB emphasizes data-parallel programming and enables multiple threads to work on different parts of a collection. Data-parallel programming scales well to larger numbers of processors because it divides the collection into smaller pieces. This scalability protects the developer from having to re-write an application every time a new chip with more processor cores ships.
 | |
| Figure 2. Better scalability and improved performance for Intel Threading Building Blocks versus Windows threads on a 2D ray tracing program. (Source: Intel) |
- Intel TBB relies on generic programming.
Traditionally, libraries specify interfaces in terms of specific types or base classes. Intel TBB uses a generic programming model instead. The essence of generic programming is writing the best possible algorithms with the fewest constraintsthe C++ Standard Template Library (STL) is a good example of generic programming. Generic programming enables Intel TBB to deliver high performance algorithms with broad applicability.
Task Scheduler
Tasks are logical units of computation. Programming with tasks instead of threads lets you think at a higher level. With task-based programming, you can concentrate on the logical dependences between tasks and leave the scheduling to the task scheduler. This is advantageous for the following reasons:
Getting the number of threads right is difficult. The threads you create with a general purpose threading package are logical threads that you must map onto the physical hardware threads. If there are not enough running logical threads to keep the physical threads working, it results in under-subscription or inefficiency. If there are more running logical threads than physical threads, it results in over-subscription or overhead. The Intel TBB Task Scheduler avoids under/over subscription by selecting the number of logical threads that will likely make the most efficient use of the underlying hardware. It maps tasks to logical threads in a way that tolerates interference by other threads from the same or other processes.
Tasks in Intel TBB are more efficient because the scheduler is unfair. Thread schedulers typically distribute time slices in a round-robin fashion. The distribution is called "fair" because each logical thread gets its fair share of time. In task-based programming, the task scheduler does have some higher-level information, so it can sacrifice fairness for efficiency.
The task scheduler does load balancing. With thread-based programming, you are often stuck dealing with load-balancing yourself, which can be tricky to get right. By breaking your program into many small tasks, the Intel TBB scheduler assigns tasks to threads in a way that spreads out the work evenly.
Generic Algorithms
The simplest form of scalable parallelism is a loop of iterations that can each run simultaneously, without interfering with each other. The high-level loop templates in Intel TBB give you efficient, scalable ways to exploit the power of multi-core chips without having to start from scratch. They let you design your software at a high task-pattern level and not worry about low-level manipulation of threads. Because they are generic, you can customize them to your specific needs.
Highly Concurrent Containers
Intel TBB provides highly-concurrent thread-safe container classes. These containers can be used with raw Windows or POSIX threads, or in conjunction with task-based programming.
A thread-safe container allows multiple threads to concurrently access and update items in the container. Typical C++ STL containers are not thread-safe. Attempts to modify them concurrently often result in corrupting the container. STL containers can be wrapped in a mutex to make them thread-safe, by letting only one thread operate on the container at a time. The drawback of this approach is that it eliminates concurrency, thus restricting parallel speedup.
Containers provided by Intel TBB offer a much higher level of concurrency, via fine-grained locking and lock-free algorithms. With fine grain locking, multiple threads operate on the container by locking only those portions that are necessary. As long as different threads access different portions, they can proceed concurrently. With lock-free algorithms, different threads account and correct for the effects of other interfering threads.
With multi-core processors quickly becoming pervasive, developers should take a look at the convenience and scalable performance Intel TBB has to offer.
Where to go from here?
Check out these resources for information on Intel Threading Building Blocks and other Intel multi-core software tools.