Interprocess synchronization has always been classified as a "slow" operation, but that is largely only because the Windows implementation requires switching to kernel mode (an expensive operation) each time you attempt to acquire a mutex. A much faster, but less general, mutex operation can be achieved on Intel-compatible processors. The implementation presented here requires only a single integer to store the mutex state:
#define ASM __asm
#define asmcall __declspec(naked)
#define AsmPrepare(x) (AsmPrepareImpl(&(x)))
#define AsmAcquire(x) (AsmAcquireImpl(&(x)))
#define AsmRelease(x) (AsmReleaseImpl(&(x)))
* A value of -1 means the mutex is free to be acquired
* Any other value means it is taken
void AsmPrepareImpl(void * data)
((uint *)data) = -1;
asmcall inline void AsmAcquireImpl(void * data)
MOV EAX, [ESP + 4] ; // load the address of 'data' into EAX
LOCK INC EAX ; // increment the DWORD at 'data'
JNZ @zop ; // ZF is set if the resulting value is 0 (it was -1 b4)
RET 4 ; // this is reached if ZF was not set, we now own the mutex
LOCK DEC EAX ; // decrement the DWORD at 'data'
PUSH 0 ; // the argument to sleep
CALL Sleep ; // take a very short nap before trying again
JMP @top ; // if at first we don't succeed ... try and try again
asmcall inline void AsmReleaseImpl(void * data)
LOCK DEC [ESP + 4] ; // decrement the DWORD at 'data'
RET 4 ; // return
This code uses the /Gz
or add STDCALL
to the specifications of the functions. While you can easily see that the above code does implement a stripped-down mutex, it has some shortcomings:
- The mutex does not allow the owning thread to acquire it multiple times, it simply blocks if this is tried; i.e. it does not "count."
- The data variable must reside in the shared memory for synchronization between processes, otherwise it can only be used to perform synchronization between threads of the same process.
- If acquiring the mutex fails, the thread does not actually sleep, it simply enters a loop in which it yields quickly to other threads; this can waste CPU time if many threads are left waiting for the same mutex.
Due to these shortcomings, the fast mutex implementation should only be used in cases where:
- Contention for a mutex is rareas in a lock on a database row.
- Synchronization speed is really an issue and you can afford the time spent in writing code that does not depend on being able to acquire the same mutex multiple timesfor many projects it is not a trivial task to convert existing code into this form.
It would not be fair to list the shortcomings of this mutex without also listing the advantages, so here they are:
- Speed: Over 11 times faster than the Platform SDK on my machine.
- Low memory overhead: Just a single 32-bit integer per mutex.
- The single DWORD (integer) of memory to be used for a mutex can be allocated practially anywhere, including right in the middle of existing data structures, thus improving locality of reference and speed further.