Working with DreamWorks Animation, several Intel employees developed an innovative technique for optimizing common Linux bottlenecks without any source or build system changes. To do it, they used LD_PRELOAD to preload highly optimized libraries from Intel Thread Building Blocks, Intel Integrated Performance Primitives and run-time libraries provided with Intel C++ Compiler for Linux.
The paper explains their methodology and techniques, which may be able to be used in other situations. It also details the impressive performance gains they achieved.