Most C++ developers will tell you that one reason they choose C++ is its ability to deliver raw performance. One reason for this is that C++ compilers are very good at turning C++ syntax into highly optimized binary code. One of Visual C++'s strengths has always been that it has a very good optimizer.
Note that the core optimizations are only available in the Visual C++ packaged with Visual Studio .NET Professional and better products. Visual C++ .NET Standard, and the .NET Framework SDK continue to ship with an optimizer-crippled compiler.
Visual C++ .NET delivered a core new optimization called Whole Program Optimization (WPL). Invoked with a compiler switch (/GL), WPL enables the compiler a second chance to optimize after the linker has done an initial pass on all the object files in the project. Normally, compilers are only able to optimize within the module that they are presently compiling. WPL technology enables the compiler to see the entire program at once and apply optimizations on a global scale, resulting in real performance gains (up to 10% in real world code). Be warnedWPL eats memory, and in some cases may slow the compilation/link process.
Microsoft enhanced some of the existing optimization heuristics for Everett, including Whole Program Optimization. For example, during WPL the compiler can remove dead parameters from function calls. If a parameter isn't used by a given function call, it is optimized away entirely.
But the real excitement in Everett for optimizations comes from two new switches: /G7 and /arch:SSE(2).
The compiler switch /G7 is the long awaited "optimize for Pentium 4 / AMD Athlon" switch. Throwing this switch won't prevent programs from executing on lesser processors, but it will make them run noticeably faster on the newer silicon. In practice, and depending on the amount of floating-point-centric code, a /G7 compiled program will run between 5 and 10% faster.
|Note: If you are targeting your program at a customer base that is still largely Pentium III or older, that the (/G7) code will run slightly slower on those machines, than if you had used the /G6 switch, which produces optimizations blended for PII and PIII.
If you write heavy-duty floating point code, take note of the /arch:SSE
switches. Throwing these will prevent your code from executing on chips lacking the specified Streaming SIMD Extension (SSE) family of instructions, but will enable the compiler to generate code that takes advantage of the additional capabilities. The gain seen in run-of-the-mill C++ code is negligible, but in certain cases floating-point code will see a 4-5% improvement.