A CPU cycle is roughly equivalent to the time needed for performing one instruction. On modern CPUs, a cycle takes about a nanosecond. In other words, a Pentium III 900 can perform about one billion instructions per second. By contrast, a RAM access operation takes about 10 nanoseconds. Therefore, if the CPU needs to fetch a program's instructions from memory, it wastes 10 instructions' time in idle mode. To minimize this bottleneck, computers have a special type of memory: cache memory. Cache memory is about 10 times faster than ordinary RAM. The problem, however, is that its size is limited to 256Kb or 512Kb at most. An average application's size well exceeds this limit, though. As a result, the system can load only a small portion of the executable image into its cache. The system must constantly swap various portions of the executable into the cache as necessary. This swapping is expensive in terms of runtime speed because it involves RAM access, or even worsedisk access. What can you do to minimize it? First of all, keep your application's size is as small as possible. A slimmer executable can give you a dramatic performance boost because the system can load larger portions thereof into the cache memory.
How can you reduce an application's size? Avoid injudicious use of inline functions and templates. Remove any unreferenced functions (i.e., functions that are never called) and unused data objects from the executable. Finally, switch on the necessary compiler and linker flags that trigger size optimization.