RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Enable Safe, Scalable Parallelism with Intel Threading Building Block's Concurrent Containers : Page 2

If you're among the many developers who are tackling the multicore future by writing multithreaded applications today, you'll soon learn that the container classes provided by the C++ STL are not thread friendly. Intel provides a C++ template library with thread-safe concurrent containers. Get a walk-through of what you can expect.

An Alternative: Intel TBB Container Classes
There is an alternative to using thread-unsafe containers wrapped by coarse-grain synchronization. The Intel TBB library is a runtime-based parallel programming model for C++ that uses threads. In addition to the scalable parallel algorithms provided by the library, Intel TBB provides safe, scalable implementations of map, queue, and vector container classes. These templates can be used directly in user-threaded code or in conjunction with the parallel algorithms included with the library.

As discussed above, the standard interfaces for some container classes are inherently thread-unfriendly. Therefore the Intel TBB concurrent containers cannot be simple drop-in replacements for their STL counterparts. Instead, Intel TBB follows the spirit of STL but provides modified interfaces where necessary to ensure thread-safety.

All of the concurrent containers in the Intel TBB library are implemented using fine-grain locking. Whenever a method is invoked on a container, only the portion of the data structure touched by the method is locked, allowing multiple threads to concurrently access other parts of the structure.

In the next few sections, I'll introduce the template classes concurrent_hash_map, concurrent_queue, and concurrent_vector, and then provide a case study that demonstrates how to detect and replace uses of unsafe containers.

Class: concurrent_hash_map< Key, T, HashCompare >
Intel TBB's template class concurrent_hash_map is similar to the associative containers of STL, but permits concurrent accesses to its elements. It's a hash table that maps from keys of type Key to values of type T. The traits type HashCompare defines the hash function to use in mapping, as well as a function that evaluates the equality of two keys. The functions travel together because they must agree that any two equal keys hash to the same hash code.

An example of a traits class is given below for keys of type string:

struct my_hash_compare {
    static size_t hash( const string& x ) {
        size_t h = 0;
        for( const char* s = x.c_str(); *s; ++s )
            h = (h*17)^*s;
        return h;
    //! True if strings are equal
    static bool equal( const string& x, const string& y ) {
        return x==y;
A concurrent_hash_map acts as a container of elements of type std::pair<const Key,T>. Typically, when accessing a container element, you are interested in either updating it or reading it. Template class concurrent_hash_map supports these two purposes respectively with the classes accessor and const_accessor. These act as smart pointers and enable the atomic access to elements that was lacking in my previous STL map example.

An accessor represents update (write) access. As long as it points to an element, all other attempts to look up that key in the table will block until the accessor is done. A const_accessor is similar, except that is represents read-only access. Multiple const_accessors are allowed to point to the same element at the same time, which can greatly improve concurrency in situations where elements are frequently read and infrequently updated.

You primarily access concurrent_hash_map elements through its methods insert, find, and erase. Methods find and insert take an accessor or const_accessor as an argument. Which one you choose tells concurrent_hash_map whether you are asking for update or read-only access. Once the method returns, the access lasts until the accessor or const_acessor is destroyed. The method remove implicitly requests write access; hence before removing the key, it will wait for any other extant accesses to finish.

The code below shows an example of inserting a new element into a concurrent_hash_map:

concurrent_hash_map<string, MyClass, my_hash_compare> string_table;

void insert_into_string_table ( string &key, MyClass &m ) {
    // create an accessor that will act as a smart pointer 
    // for write access
    concurrent_hash_map<string, MyClass, my_hash_compare>::accessor a;

    // call insert to create a new element, or return an existing 
    // element if one exists.
    // accessor a locks this element for exclusive use by this thread
    string_table.insert( a, key );

    // modify the value held by the pair
    a->second = m;

    // the accessor "a" releases the lock on the element when it is 
    // destroyed at the end of the scope
concurrent_queue< Key, T, HashCompare >
Intel TBB's template class concurrent_queue<T> implements a concurrent queue with values of type T. Multiple threads may simultaneously push and pop elements from the queue. By default, the queue is unbounded, but can be bounded by setting the maximum capacity.

In general, queues are first-in first-out data structures, and in a single-threaded program a queue can be designed to support this strict ordering. However, if multiple threads are pushing and popping concurrently, the definition of "first" is not so well defined. The Intel TBB template class concurrent_queue instead guarantees that if a thread pushes two values, and another thread pops those two values, they will be popped in the same order that they were pushed. The interleaving of values pushed by different threads is not restricted.

A concurrent queue is often used in producer-consumer applications, where data produced by one thread is consumed by another. To flexibly support these types of applications, Intel TBB offers both a blocking and non-blocking version of pop. Method pop_if_present is non-blocking. It attempts to pop a value, but if the queue is empty, it immediately returns. Method pop on the other hand, blocks until an item is available and then pops that item from the queue.

The example below uses the blocking method pop for communication between two threads using a concurrent_queue<int>. Thread 1 prints each value pushed into the queue by Thread 0 as it becomes available:


Unlike most STL containers, concurrent_queue::size_type is a signed integral type, not unsigned. This is because concurrent_queue::size() is defined as the number of push operations started minus the number of pop operations started. If pops outnumber pushes, size() becomes negative. For instance, if a concurrent_queue is empty, and there are n pending pop operations, size() returns n. This provides an easy way for producers to know how many consumers are waiting on the queue.

concurrent_vector< Key, T, HashCompare >
The final concurrent container defined by TBB is the template class concurrent_vector. A concurrent_vector is a dynamically growable array that allows access to elements in the vector while it is concurrently being grown.

Class concurrent_vector defines two methods for safe concurrent growth: grow_by and grow_to_at_least. Method grow_by(n) lets you safely append n consecutive elements to a vector and returns the index of the first appended element. Each element is initialized with T(). Method grow_to_at_least(n)grows a vector to size n if it is currently smaller than n elements.

The following routine safely appends a C string to a shared vector:

void Append ( concurrent_vector<char>& vector, const char* string) {
    size_t n = strlen(string) + 1;
    memcpy( &vector[vector_grow_by(n)], string, n+1 );
Method size() returns the number of elements in the vector, which may include elements that are still undergoing concurrent construction by methods grow_by and grow_to_at_least. Consequently, it is safe to use the iterators while the concurrent_vector is being grown, as long as the iterators never go past the current value of end(). However, the iterator may reference an element undergoing concurrent construction. When using an Intel TBB concurrent_vector, synchronizing construction and access is the responsibility of the developer.

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date