ultithreaded applications are notoriously difficult to write, test, and debug. However, to take full advantage of the added performance potential of multicore desktop and laptop systems, developers now face the challenging task of threading their applications. While there is no panacea for the difficulties that arise in multithreaded application development, leveraging existing libraries and tools can dramatically ease the burden of this transition.
In this article, I'll examine one of the leading sources of correctness and performance issues encountered when threading C++ applications: the use of thread-unsafe container classes. After providing some examples of why this issue arises, I'll describe the concurrent container classes provided by the Intel Threading Building Blocks (Intel TBB) library, a C++ template library specifically designed to aid in developing multithreaded applications. The concurrent container classes in TBB can be leveraged to safely add scalable parallelism to applications.
Are Your Containers Thread-Safe?
Many developers rely on hand-written container classes or those provided by implementations of the C++ Standard Template Library (STL). Unfortunately, these libraries are often not thread-safe. In particular, the STL specification makes no mention of threads or the behavior required of container classes when used in multithreaded code. It is therefore commonly the case that the implementations of these STL container classes are not thread safe.
For example, consider the use of an STL map<string, MyClass> values:
| Editor's Note: Michael Voss is a Senior Staff Software Engineer at Intel Corporation, which is the owner-developer of the TBB technology discussed herein. This article has been selected for publication because we believe it to have objective technical merit. No specific endorsement of Intel technologies by the editors of DevX is implied.
Even though two distinct values associated with two distinct keys are being modified in the above code, most STL implementations provide no guarantee of correct behavior. Performing these operations concurrently without synchronization may corrupt the map. With no requirements specified for thread-safety, it’s even possible that accessing two distinct maps may lead to data corruption.
Of course, it's possible to implement the STL template class map in such a way to make the above code thread safe. Unfortunately, some common map operation sequences cannot be implemented in a thread friendly way. While each operation alone may be made thread safe, sequences commonly used in serial code can lead to unexpected results. For example, what if two threads operate on the same element in the map using the code below:
The code executed by Thread 0 performs two operations. First it invokes operator  to retrieve a reference to the object associated with "Key1". If this key is not in the map, operator  allocates space to hold an object of type MyClass to associate with this key. Next operator = is invoked to copy the temporary instance of MyClass to the object pointed to by the retrieved reference.
The desired outcome is that either "Key1" does not appear in the map, or it is paired with an instance of MyClass(). But without user-inserted synchronization, other outcomes are possible, even when each operator itself is thread-safe. The method erase invoked by Thread 1 might occur between the call to operator  and the call to operator = by Thread 0. In that case, Thread 0 will attempt to invoke operator = on a deleted object, resulting in incorrect behavior. This common type of multithreading bug is known as a race condition; the behavior (unintentionally) depends on which thread performs its operation first.
One particularly difficult aspect of a race, as shown by this example, is that the behavior is non-deterministic. During every run of this code during testing, it's possible that the call to erase from Thread 1 never falls between the fetch and update on Thread 0. Such a bug can therefore evade testing, and lie dormant in your validated and shipped code, potentially failing at any time on a customer's system.
To avoid these bugs and to ensure correctness when using thread-unfriendly container classes, developers are relegated to wrapping locks around all uses of each container, allowing only a single thread to access the container at a time. This coarse-grain approach to synchronization limits the concurrency available in the application and adds code to each access point, increasing complexity. However, it's a price that must be paid to make use of these existing libraries.