Sometimes you need to know the number of bits set in a word. I often see code to count the number set bits written as a loop, processing one bit at a time. There is a much faster method that uses a few statements of straight-line code.

You can think of a 32-bit number as a vector of 32 1-bit numbers side-by side. Replace each of the 16 pairs of 1-bit numbers with a 2-bit number that is the sum of the 1-bit numbers. Next replace each of the 8 pairs of 2-bit numbers with a 4-bit number that is the sum of the 2-bit numbers. Repeat until you have one 32-bit number. You can use standard integer addition to add all the numbers at once, provided you make sure that none of the fields overflow into an adjacent number.

```
*/
typedef unsigned int UINT32;
UINT32 count_bits_first_version(UINT32 bits)
{
bits = ((bits >> 1) & 0x55555555) + _
(bits & 0x55555555);
bits = ((bits >> 2) & 0x33333333) + _
(bits & 0x33333333);
bits = ((bits >> 4) & 0x0F0F0F0F) + _
(bits & 0x0F0F0F0F);
bits = ((bits >> 8) & 0x00FF00FF) + _
(bits & 0x00FF00FF);
bits = ((bits >>16) & 0x0000FFFF) + _
(bits & 0x0000FFFF);
return bits;
}
/*
```

This code can be made faster by removing unnecessary operations. Here is the fully optimized version.

```
*/
UINT32 count_bits(UINT32 bits)
{
bits = bits - ((bits >> 1) & 0x55555555);
bits = ((bits >> 2) & 0x33333333) + _
(bits & 0x33333333);
bits = ((bits >> 4) + bits) & 0x0F0F0F0F;
return (bits * 0x01010101) >> 24;
}
/*
```

The first line saves an AND instruction by exploiting this clever relationship for each pair of bits. The two bits 'ab' represent the number 2a+b. To count the bits we subtract 'a' (i.e. 2a+b - a = a+b).

The last line uses a multiplication to do several additions at once.