Avoiding the Perils of C++0x Data Races

ace conditions are an inherent part of parallel programming. A race condition exists any time a program’s behavior may depend on the relative ordering of events on separate threads. In the vast majority of cases, race conditions are harmless?the program works regardless of which thread gets a lock first, or which thread processes a chunk of data. In some cases, however, race conditions can cause problems.

The main danger with race conditions is that by their very nature they are timing dependent. This becomes problematic when one thread executes a particular piece of code while another thread is executing a different piece of code. If the pieces of code in question are very small (only one or two CPU instructions, for example) and occur very rarely, then the race condition might not show up very often, and you may miss it entirely during testing. In fact, the conditions necessary for the problem to occur may not manifest at all during testing.

For example, if your test system has only one CPU, then threads cannot really execute in parallel. You must interleave them. This lack of true concurrency means that some potential race condition problems just cannot occur. The same problem exists to a lesser extent if you test code on a system with a small number of CPUs (for example, on a dual-core desktop machine) when the problematic conditions can happen only with a higher level of parallelism (for example, on a 64-CPU server machine).

This article demonstrates how race conditions in general and C++0x data races in particular can cause real problems in parallel code and offers some tips for preventing them.

Locks Cannot Prevent Race Conditions

Protecting your data with a mutex lock does not guarantee that your code will be free from problematic race conditions, even if you obsessively ensure that the data is accessed only while the lock is held. If the lock is at the wrong level of granularity, or the scope of the lock is wrong, then problematic race conditions can still occur.

For example, consider a simple data structure that contains a list of items and a count of the items in the list. If you protect each part of the data structure with its own mutex, you can still get race conditions even though everything is nominally synchronized. Because the parts are protected with individual mutexes, you must update them separately. This means that at certain points you will have updated one and not the other (for example, you have added a new item to the list but have not yet updated the count). Thus, when another thread accesses the data structure, it will see the two parts of the data structure as out of sync with each other.

The solution in this case is obvious: use a single mutex to protect the entire data structure. In more complex cases, it can be much harder to identify scenarios where there may be a race condition, and eliminating the race condition may require more extensive changes, such as changes to the interface.

Data Races Are Always Dangerous

In C++0x, a “Data Race” is a particular kind of race condition where two threads both access a non-atomic variable without synchronization, and at least one of those accesses is a write. A data race results in undefined behavior, so if you have a data race in C++0x then your program really could do anything at all.

It is a disturbingly common misconception that such data races are not problematic in practice. They are. In the absence of synchronization such as a mutex lock or atomic operations, compilers are free to optimize code such that variable accesses occur in a different order than the order written in your code. Not only that, but even if the instructions are generated in the sequence you expect, the actual memory accesses performed by the CPU may occur out of order. This is a particular issue with modern CPUs that have long instruction pipelines, branch prediction, and prefetching. For instance, the actual memory access for a load may occur several instructions prior to the load.

In order to demonstrate some of the problems with data races in C++0x, I wrote the following simple program:

#include #include #include unsigned const increment_count=2000000;unsigned const thread_count=2;unsigned i=0;void func(){    for(unsigned c=0;c threads;    for(unsigned c=0;cfunc is essentially a single INC instruction on an x86 CPU. You might therefore expect that the final value of your global variable i is simply the number of increments performed by all threads (thread_count * increment_count), which is not the case. The INC instruction is not atomic, so if you run this code on a multicore or multiprocessor system, then the final value of i will often be much less than the number of increments.

To demonstrate this point, here is the output of five consecutive runs of this code on my dual-core x86 laptop:

2 threads, Final i=2976075, increments=40000002 threads, Final i=3097899, increments=40000002 threads, Final i=4000000, increments=40000002 threads, Final i=3441342, increments=40000002 threads, Final i=2942251, increments=4000000

Because the code increments i 4,000,000 times (2,000,000 times on each thread), and it starts at zero, you might naively expect to see a final value of 4000000 (which one of the runs does produce). However, this is not the case; most of the time, you get far less. This is because the non-atomic increments on the different threads interfere with each other.

On x86 architectures, non-atomic increment operations are just a simple memory read, followed by a simple memory write of the new value. If another thread updates the value between the read and the write, then that value will be overwritten. The consequences might be different on other architectures. For instance, you might get values that are some combination of the values written, or you might get a processor exception.

Now imagine that this global counter is a reference count for some resource, and that each thread decrements the counter the same number of times as it increments it. The intention for the counter is that when it reaches zero then the object is freed. If some of the increments or decrements do not behave as expected, then this could easily lead to a memory leak or to the resource being freed too early. To demonstrate this behavior, I modified the program code:

#include #include #include unsigned const increment_count=2000000;unsigned const thread_count=2;unsigned i=0;void func(){    for(unsigned c=0;c threads;    for(unsigned c=0;c
				
				
Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

Recent Articles:

©2023 Copyright DevX - All Rights Reserved. Registration or use of this site constitutes acceptance of our Terms of Service and Privacy Policy.

Sitemap