Simpler Multithreading in C++0x

ne major new feature in the C++0x standard is multi-threading support. Prior to C++0x, any multi-threading support in your C++ compiler has been provided as an extension to the C++ standard, which has meant that the details of that support varies between compilers and platforms. However, with the new standard, all compilers will have to conform to the same memory model and provide the same facilities for multi-threading (though implementors are still free to provide additional extensions). What does this mean for you? It means you’ll be able to port multi-threaded code between compilers and platforms with much reduced cost. This will also reduce the number of different APIs and syntaxes you’ll have to know when writing for multiple platforms.

The core of the new thread library is the std::thread class, which manages a thread of execution, so let’s start by looking at that.

Launching Threads
You start a new thread by constructing an instance of std::thread with a function. This function is then used as the entry point for the new thread, and once that function returns, the thread is finished:

    void do_work();    std::thread t(do_work);

This is just like the thread-creation APIs we’re all used to?but there’s a crucial difference: This is C++, so we’re not restricted to functions. Just like many of the algorithms in the Standard C++ Library, std::thread will accept an object of a type that implements the function call operator (operator()), as well as ordinary functions:

    class do_work    {    public:        void operator()();    };    do_work dw;    std::thread t(dw);

It’s important to note that this actually copies the supplied object into the thread. If you really want to use the object you supplied (in which case, you’d better make sure that it doesn’t get destroyed before the thread finishes), you can do so by wrapping it in std::ref:

    do_work dw;    std::thread t(std::ref(dw));

Most thread creation APIs allow you to pass a single parameter to your newly created thread, typically a long or a void*. std::thread allows arguments too, but you can pass any number, of (almost) any type. Yes, you read that right: any number of arguments. The constructor uses C++0x’s new variadic template facility to allow a variable number of arguments like the old … varargs syntax, but in a type-safe manner.

You can now pass objects of any copyable type as arguments to the thread function:

    void do_more_work(int i,std::string s,std::vector v);    std::thread        t(do_more_work,42,"hello",std::vector(23,3.141));

Just as with the function object itself, the arguments are copied into the thread before the function is invoked, so if you want to pass a reference you need to wrap the argument in std::ref:

    void foo(std::string&);    std::string s;    std::thread t(foo,std::ref(s));

OK, that’s enough about launching threads. What about waiting for the thread to finish? The C++ Standard calls that “joining” with the thread (after the POSIX terminology), and you do that with the join() member function:

    void do_work();    std::thread t(do_work);    t.join();

If you’re not planning on joining with your thread, just destroy the thread object or call detach():

    void do_work();    std::thread t(do_work);    t.detach();

Now, it’s very well launching all these threads, but if you’re going to share data you’d better protect it. The new C++ Standard Library provides facilities for that, too.

Protecting Data
In the C++0x thread library, as with most thread APIs, the basic facility for protecting shared data is the mutex. In C++0x, there are four varieties of mutexes:

  • non-recursive (std::mutex)
  • recursive (std::recursive_mutex)
  • non-recursive that allows timeouts on the lock functions (std::timed_mutex)
  • recursive mutex that allows timeouts on the lock functions (std::recursive_timed_mutex)

All of them provide exclusive ownership for one thread. If you try and lock a non-recursive mutex twice from the same thread without unlocking in between, you get undefined behavior. A recursive mutex simply increases the lock count?you must unlock the mutex the same number of times that you locked it?in order for other threads to be allowed to lock the mutex.

Though these mutex types all have member functions for locking and unlocking, in most scenarios the best way to do it is with the lock class templates std::unique_lock<> and std::lock_guard<>. These classes lock the mutex in the constructor and release it in the destructor. Thus, if you use them as local variables, your mutex is automatically unlocked when you exit the scope:

    std::mutex m;    my_class data;    void foo()    {        std::lock_guard lk(m);        process(data);    }   // mutex unlocked here

std::lock_guard is deliberately basic and can only be used as shown. On the other hand, std::unique_lock allows for deferred locking, trying to lock, trying to lock with a timeout, and unlocking before the object is destroyed. If you’ve chosen to use std::timed_mutex because you want the timeout on the locks, you probably need to use std::unique_lock:

    std::timed_mutex m;    my_class data;    void foo()    {        std::unique_lock            lk(m,std::chrono::milliseconds(3)); // wait up to 3ms        if(lk) // if we got the lock, access the data            process(data);    }   // mutex unlocked here

These lock classes are templates, so they can be used with all the standard mutex types, plus any additional types that supply lock() and unlock() functions.

Protecting Against Deadlock When Locking Multiple Mutexes
Occasionally, an operation requires you to lock more than one mutex. Done wrong, this is a nasty source of deadlocks: Two threads can try and lock the same mutexes in the opposite order, with each end upholding one mutex and waiting for the other thread to finish with the other mutexes. The C++0x thread library allievates this problem, in those cases where you wish to acquire the locks together, by providing a generic std::lock function that can lock multiple mutexes at once. Rather than calling the lock() member function on each mutex in turn, you pass them to std::lock(), which locks them all without risking deadlock. You can even pass in currently unlocked instances of std::unique_lock<>:

    struct X    {        std::mutex m;        int a;        std::string b;    };    void foo(X& a,X& b)    {        std::unique_lock lock_a(a.m,std::defer_lock);        std::unique_lock lock_b(b.m,std::defer_lock);        std::lock(lock_a,lock_b);        // do something with the internals of a and b    }

In the above example, suppose you didn’t use std::lock. This could possibly result in a deadlock if one thread did foo(x,y) and another did foo(y,x) for two X objects x and y. With std::lock, this is safe.

Protecting Data During Initialization
If your data only needs protecting during its initialization, using a mutex is not the answer. Doing so only leads to unnecessary synchronization after initialization is complete. The C++0x standard provides several ways of dealing with this.

First, suppose your constructor is declared with the new constexpr keyword and satisfies the requirements for constant initialization. In this case, an object of static storage duration, initialized with that constructor, is guaranteed to be initialized before any code is run as part of the static initialization phase. This is the option chosen for std::mutex, because it eliminates the possibility of race conditions with initialization of mutexes at a global scope:

class my_class    {        int i;    public:        constexpr my_class():i(0){}        my_class(int i_):i(i_){}        void do_stuff();    };    my_class x; // static initialization with constexpr constructor    int foo();    my_class y(42+foo()); // dynamic initialization    void f()    {        y.do_stuff(); // is y initialized?    }

Your second option is to use a static variable at block scope. In C++0x, initialization of block scope static variables happens the first time the function is called. If a second thread should call the function before the initialization is complete, then that second thread has to wait:

    void bar()    {        static my_class z(42+foo()); // initialization is thread-safe        z.do_stuff();    }

If neither options apply (perhaps because the object is dynamically allocated), then it’s best to use std::call_once and std::once_flag. As the name suggests, when std::call_once is used in conjunction with a specific instance of type std::once_flag, the specified function is called exactly once:

    my_class* p=0;    std::once_flag p_flag;    void create_instance()    {        p=new my_class(42+foo());    }    void baz()    {        std::call_once(p_flag,create_instance);        p->do_stuff();     }

Just as with the std::thread constructor, std::call_once can take function objects instead of functions, and can pass arguments to the function. Again, copying is the default, and you have to use std::ref if you want a reference.

Waiting for Events
If you’re sharing data between threads, you often need one thread to wait for another to perform some action, and you want to do this without consuming any CPU time. If a thread is simply waiting for its turn to access some shared data, then a mutex lock can be sufficient. However, generally doing so won’t have the desired semantics.

The simplest way to wait is to put the thread to sleep for a short period of time. Then check to see if the desired action has occurred when the thread wakes up. It’s important to ensure that the mutex you use to protect the data indicating that the event has occurred is unlocked whilst the thread is sleeping:

    std::mutex m;    bool data_ready;    void process_data();    void foo()    {        std::unique_lock lk(m);        while(!data_ready)        {            lk.unlock();            std::this_thread::sleep_for(std::chrono::milliseconds(10));            lk.lock();        }        process_data();    }

This method may be simplest, but it’s less than ideal for two reasons. Firstly, on average, the thread will wait five ms (half of ten ms) after the data is ready before it will wake in order to check. This may cause a noticeable lag in some cases. Though this can be improved by reducing the wait time, it exacerbates the second problem: the thread has to wake up, acquire the mutex, and check the flag every ten ms?even if nothing has happened. This consumes CPU time and increases contention on the mutex, and thus potentially slows down the thread performing the task for which it’s waiting!

If you find yourself writing code like that, don’t: Use condition variables instead. Rather than sleeping for a fixed period, you can let the thread sleep until it has been notified by another thread. This ensures that the latency between being notified and the thread waking is as small as the OS will allow, and effectively reduces the CPU consumption of the waiting thread to zero for the entire time. You can rewrite foo to use a condition variable like this:

    std::mutex m;    std::condition_variable cond;    bool data_ready;    void process_data();    void foo()    {        std::unique_lock lk(m);        while(!data_ready)        {            cond.wait(lk);        }        process_data();    }

Note that the above code passes in the lock object lk as a parameter to wait(). The condition variable implementation then unlocks the mutex on entry to wait(), and locks it again on exit. This ensures that the protected data can be modified by other threads whilst this thread is waiting. The code that sets the data_ready flag then looks like this:

    void set_data_ready()    {        std::lock_guard lk(m);        data_ready=true;        cond.notify_one();    }

You still need to check that the data is ready though, since condition variables can suffer from what are called spurious wakes: The call to wait() may return even though it wasn’t notified by another thread. If you’re worried about getting this wrong, you can pass that responsibility off to the standard library too, if you tell it what you’re waiting for with a predicate. The new C++0x lambda facility makes this really easy:

    void foo()    {        std::unique_lock lk(m);        cond.wait(lk,[]{return data_ready;});        process_data();    }

What if you don’t want to share your data? What if you want exactly the opposite: For each thread to have its own copy? This is the scenario addressed by the new thread_local storage duration keyword.

Thread Local Data
The thread_local keyword can be used with any object declaration at namespace scope at local scope, and specifies that such a variable is thread local. Each thread thus has its own copy of that variable, and that copy exists for the entire duration of that thread. It is essentially a per-thread static variable, so each thread’s copy of a variable declared at local scope is initialized the first time that particular thread passes through the declaration, and they retain their values until that thread exits:

    std::string foo(std::string const& s2)    {        thread_local std::string s="hello";        s+=s2;        return s;    }

In this function, each thread’s copy of s starts life with the contents “hello.” Every time the function is called, the supplied string is appended to that thread’s copy of s. As you can see from this example, this even works with class types that have constructors and destructors (such as std::string), which is an improvement over the pre-C++0x compiler extensions.

Thread-local storage isn’t the only change to the concurrency support in the core language: There’s also a brand new multi-threading aware memory model, with support for atomic operations.

The New Memory Model and Atomic Operations
Sticking to using locks and condition variables to protect your data, you won’t need to worry about the memory model. The memory model guarantees to protect your data from race conditions?if you use locks correctly. You’ll get undefined behavior if you don’t.

If you’re working at a really low-level and providing high-performance library facilities, then it’s important to know the details?which are too complicated to go into here. For now, it’s enough to know that C++0x has a set of atomic types corresponding to the built-in integer types and void pointers?and a template std::atomic<>?which can be used to create an atomic version of a simple user-defined type. You can look up the relevant documentation for the details.

That’s All, Folks!
And that’s your whistle-stop tour of the new C++0x threading facilities, which has barely scratched the surface. There’s much more to the library, with features such as thread IDs and asynchronous future values.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Related Posts