Sometimes you need to know the number of bits set in a word. I often see code to count the number set bits written as a loop, processing one bit at a time. There is a much faster method that uses a few statements of straight-line code.
You can think of a 32-bit number as a vector of 32 1-bit numbers side-by side. Replace each of the 16 pairs of 1-bit numbers with a 2-bit number that is the sum of the 1-bit numbers. Next replace each of the 8 pairs of 2-bit numbers with a 4-bit number that is the sum of the 2-bit numbers. Repeat until you have one 32-bit number. You can use standard integer addition to add all the numbers at once, provided you make sure that none of the fields overflow into an adjacent number.
*/typedef unsigned int UINT32;UINT32 count_bits_first_version(UINT32 bits){ bits = ((bits >> 1) & 0x55555555) + _(bits & 0x55555555); bits = ((bits >> 2) & 0x33333333) + _(bits & 0x33333333); bits = ((bits >> 4) & 0x0F0F0F0F) + _(bits & 0x0F0F0F0F); bits = ((bits >> 8) & 0x00FF00FF) + _(bits & 0x00FF00FF); bits = ((bits >>16) & 0x0000FFFF) + _(bits & 0x0000FFFF); return bits;}/*
This code can be made faster by removing unnecessary operations. Here is the fully optimized version.
*/UINT32 count_bits(UINT32 bits){ bits = bits - ((bits >> 1) & 0x55555555); bits = ((bits >> 2) & 0x33333333) + _(bits & 0x33333333); bits = ((bits >> 4) + bits) & 0x0F0F0F0F; return (bits * 0x01010101) >> 24;}/*
The first line saves an AND instruction by exploiting this clever relationship for each pair of bits. The two bits ‘ab’ represent the number 2a+b. To count the bits we subtract ‘a’ (i.e. 2a+b – a = a+b).
The last line uses a multiplication to do several additions at once.