Lock-Free and Non-Blocking Algorithms
As promised earlier, one strategy that avoids locks and their associated problems advocated by some researchers
is to use non-blocking synchronization methods such as lock-free/wait-free programming techniques
and software transactional memory
. These techniques aim to provide "wait-freedom", thereby addressing issues stemming from the blocking nature of locks without compromising performance.
Unfortunately, our experience with non-blocking algorithms has been (so far) disappointing, and many other developers and researchers agree. Almost all non-blocking algorithms invariably use one or more hardware-supported atomic operations (such as compare-and-swap (CAS) and load-link/store-conditional (LL/SC)). Some even use double-word CAS (DCAS).
Dependence on these atomic primitives makes them difficult to write (see Doherty, Simon, et al, "DCAS is not a silver bullet for nonblocking algorithm design." Proceedings of the 16th Annual ACM Symposium on Parallelism in Algorithms and Architectures. 2004, and Herb Sutter's article "Lock-Free Code: A False Sense of Security"), difficult to validate for correctness (see Gotsman, Alexey, et al, "Proving That Non-Blocking Algorithms Don't Block," Symposium on Principles of Programming Languages to appear in 2009), and difficult to port to other platforms. This is probably one reason why non-blocking algorithms have been limited to simple data structures. Furthermore, improved performance over lock-based implementations seems hard to get.
Arguments for the other benefits are not compelling enough to warrant the pain of switching to non-blocking algorithms. Fairness is contingent upon the underlying atomic operations; in some cases, livelock is still possible. For many user applications, benefits such as real-time support and fault tolerance are a good-to-have, not a must-have. In other cases, solutions provided by operating systems are sufficient (e.g., priority inheritance for priority inversion).
Software Transactional Memory (STM) is another alternative to lock-based synchronization. It abstracts away the use of low-level atomic primitives using the notion of transactions, and simplifies synchronizing access to shared variables through optimistic execution and a roll-back mechanism. Like non-blocking algorithms, STM promises performance gains over lock-based synchronization, and also promises to avoid many common locking pitfalls. The results so far are not so favorable. One publication "observes that the overall performance of TM is significantly worse at low levels of parallelism" (see Cascaval, Calin, et al, "Software Transactional Memory: Why is it only a research toy?" ACM Queue. 2008, Vol. 6, 5.) However, STM is a relatively young research area, so the jury is still out.
Lock It Up
Locks have been unfairly vilified as a hindrance to the development of efficient concurrent applications on burgeoning multi-core platforms. However, our experiences suggest that rather than discouraging the use of mutex locks, one should instead promote their well-disciplined use. More often than not, implementations with such locks outperform those with non-blocking algorithms or STM.
The most important consideration for making the best use of mutex locks is understanding the application well, using tools to aid that understanding where necessary, and selecting the best-fitting synchronization method for each critical section. When you do choose a mutex, use it with the recommended guidelines, but keep flexibility in mind. Doing so will prevent most common mutex-related pitfalls without incurring unwarranted performance penalties. Finally, shun the do-it-yourself temptation, and delegate work to well-supported concurrency libraries.