A Smorgasbord of Lock Flavors
Intel® Threading Building Blocks offers a gamut of mutex locks with different traits, because critical sections with different access patterns call out mutex locks with different trade-offs. Other libraries may offer similar choices. You need to know your application to select the most appropriate lock flavor for each critical section.
Spin mutex vs. Queuing Mutex
The most prominent distinguishing property for locks is fairness
—whether a lock allows "fair" access to the critical section or not. This is an important consideration when choosing a lock, but its importance may vary depending on circumstances. For example, an operating system should guarantee
that no process gets unfairly delayed when multiple processes contend against each other to get into a critical section. By contrast, unfairness among threads in a user process may be tolerable to some degree if it helps boost the throughput.
TBB's spin_mutex is an unfair lock. Threads entering a critical section with a spin_mutex repeatedly attempt to acquire the lock (they spin-wait until they get into the critical section, thus the name). In theory, the waiting time for a spin_mutex is unbounded. The TBB queuing_mutex, on the other hand, is a fair lock, because a thread arriving earlier at a critical section will get into it earlier than one arriving later. Waiting threads form a queue. A newly arriving thread puts itself at the end of the queue using a non-blocking atomic operation and spin-waits until its flag is raised. A thread leaving the critical section hands the lock over to the next in line by raising the latter's flag.
Unfortunately there are no cast-in-stone guidelines or criteria that dictate when to use an unfair spin_mutex and when to use a fair queuing_mutex. In general though, guaranteeing fairness costs more. When a critical section is brief and contention is light, the chance of a thread being starved is slim and any additional overhead for unneeded fairness may not be warranted. In those cases, use a spin_mutex.
The TBB queuing_mutex spin-waits on a local cache line and does not interfere with other threads' memory access. Consider using a queuing mutex for modestly sized critical sections and/or when you expect a fairly high degree of contention.
One report claims that, using a test program, "with spin locks, a difference of up to 2x runtime per thread was observed and some threads were unfairly granted the lock up to 1 million times on an 8-core Opteron machine." If you suspect your application suffers from unfairness due to a spin_mutex, switching to a fair mutex such as queuing_mutex is your answer. But before switching, back up your decision with concrete measurement data.
Not all concurrent accesses need to be mutually exclusive. Indeed, accesses to many concurrent data structures are mostly read-accesses, and only occasionally need write-accesses. For these structures, keeping one reader spin-waiting while another reader is in the critical section is not necessary.
TBB reader/writer mutexes allow multiple readers
to be in a critical section while giving writers exclusive access to it. The unfair version is called spin_rw_mutex
, while the fair version is queuing_rw_mutex
. These mutexes also allow readers to upgrade to writers and writers to downgrade to readers.
Under some circumstances, you can replace reader-side locks with less expensive operations (although potentially at the expense of writers). One such example is a sequential lock; another is read-copy-update lock. These locks are less-general reader-writer locks, so using them properly in applications requires more stringent scrutiny.
Mutex and Recursive_Mutex
TBB provides a mutex that wraps around the underlying OS locks, but compared to the native version, adds portability across all supported operating systems. In addition the TBB mutex releases the lock even when an exception is thrown from the critical section.
A sibling, recursive_mutex, permits a thread to acquire multiple locks on the same mutex. The thread must release all locks on a recursive_mutex before any other thread can acquire a lock on it.
Avoiding Lock Pitfalls
There is no shortage of references that warn about the inevitable dangers of using locks, such as deadlocks and livelocks. However, you can reduce the chances of getting ensnared by these problems considerably by instituting a few simple rules.
Avoid explicit use of locks. Instead use concurrent containers and concurrent algorithms provided in well-supported concurrency libraries such as Intel® Threading Building Blocks. If you think your application requires explicit use of locks, avoid implementing your own locks and use well-tested well-tuned locks such as TBB locks.
Avoid making calls to functions (particularly unknown ones) while holding a lock. In general, calling a function while holding a lock is not good practice. For one thing, it increases the size of the critical section, thus increasing the wait-times of other threads. More seriously, you may not know whether the function contains lock acquisition code. Even if it does not now, it may in the future. Such changes potentially lead to a deadlock situation, and when that happens, it's very difficult to locate and fix. If possible, re-factor the critical section so that it computes the function arguments in the critical section but invokes the function outside the critical section.
Avoid holding multiple locks. Circular lock acquisition is a leading cause of dead lock problems. If you must hold multiple locks, always acquire the locks in the same order and then release them in the same order that they were acquired.
Avoid using recursive locks. You may be able to find some isolated cases where recursive locks make great sense. However, locks don't compose well. Even a completely unrelated change to a part of your application may lead to a deadlock, and the problem will be very difficult to locate (see this newsgroup thread, and this entry in Joe Duffy's blog). If, after reading those, you still believe you need recursive locks, use them at your own peril.
Even if you do everything you possibly can to avoid deadlocks and livelocks, problems may still occur. If you suspect your application has a deadlock or race condition, and you cannot locate it quickly, don't get burned by trying to resolve it by yourself. Use tools such as Intel® Parallel Inspector.