A C++ Mutex Operation for Intel-compatible Processors

A C++ Mutex Operation for Intel-compatible Processors

Interprocess synchronization has always been classified as a “slow” operation, but that is largely only because the Windows implementation requires switching to kernel mode (an expensive operation) each time you attempt to acquire a mutex. A much faster, but less general, mutex operation can be achieved on Intel-compatible processors. The implementation presented here requires only a single integer to store the mutex state:

#define ASM __asm#define asmcall __declspec(naked)#define AsmPrepare(x) (AsmPrepareImpl(&(x)))#define AsmAcquire(x) (AsmAcquireImpl(&(x)))#define AsmRelease(x) (AsmReleaseImpl(&(x)))/*** * A value of -1 means the mutex is free to be acquired * Any other value means it is taken ***/void AsmPrepareImpl(void * data){  ((uint *)data) = -1;}asmcall inline void AsmAcquireImpl(void * data){  ASM {  top:    MOV EAX, [ESP + 4] ; // load the address of 'data' into EAX    LOCK INC EAX ; // increment the DWORD at 'data'    JNZ @zop ; // ZF is set if the resulting value is 0 (it was -1 b4)    RET 4 ; // this is reached if ZF was not set, we now own the mutex  zop:    LOCK DEC EAX ; // decrement the DWORD at 'data'    PUSH 0 ; // the argument to sleep    CALL Sleep ; // take a very short nap before trying again    JMP @top ; // if at first we don't succeed ... try and try again  }}asmcall inline void AsmReleaseImpl(void * data){  ASM {    LOCK DEC [ESP + 4] ; // decrement the DWORD at 'data'    RET 4 ; // return  }}

This code uses the /Gz or add STDCALL to the specifications of the functions. While you can easily see that the above code does implement a stripped-down mutex, it has some shortcomings:

  1. The mutex does not allow the owning thread to acquire it multiple times, it simply blocks if this is tried; i.e. it does not “count.”
  2. The data variable must reside in the shared memory for synchronization between processes, otherwise it can only be used to perform synchronization between threads of the same process.
  3. If acquiring the mutex fails, the thread does not actually sleep, it simply enters a loop in which it yields quickly to other threads; this can waste CPU time if many threads are left waiting for the same mutex.
See also  The Art of AI-Generated Meeting Minutes

Due to these shortcomings, the fast mutex implementation should only be used in cases where:

  1. Contention for a mutex is rare?as in a lock on a database row.
  2. Synchronization speed is really an issue and you can afford the time spent in writing code that does not depend on being able to acquire the same mutex multiple times?for many projects it is not a trivial task to convert existing code into this form.

It would not be fair to list the shortcomings of this mutex without also listing the advantages, so here they are:

  1. Speed: Over 11 times faster than the Platform SDK on my machine.
  2. Low memory overhead: Just a single 32-bit integer per mutex.
  3. The single DWORD (integer) of memory to be used for a mutex can be allocated practially anywhere, including right in the middle of existing data structures, thus improving locality of reference and speed further.


About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist