Back to C++ Optimization Techniques

5: Compiler Optimization

A good compiler can have a huge effect on code performance. Most PC compilers are good, but not great, at optimization. Be aware that sometimes the compiler won't perform optimizations even though it can. The compiler assigns a higher priority to producing consistent and correct code than optimizing performance. Be thankful for small favors.

5.1: Compiler C Language Settings

The following table lists all of the MS Visual C++ 6.0 "C" optimizations for reference. Alternate methods are given when an optimization can be specified directly in the code. Microsoft default values for release builds are highlighted.

Name

Option

Description

Blend

/GB

Optimize for 386 and above

Pentium

/G5

Optimize for Pentium and above

Pentium Pro

/G6

Optimize for PentiumPro and above

Windows

/GA

Optimize for Windows (specifically access to thread-specific data)

DLL

/GD

Not currently implemented. Reserved for future use.

Cdecl

/Gd

Caller cleans stack. Slow. Allows variable argument functions. Alternate: __cdecl

Stdcall

/Gz

Callee cleans stack. Fast. No variable argument functions. Alternate: __stdcall

Fastcall

/Gr

Callee cleans stack. Uses registers. Fastest. No variable argument functions. Can't be used with _export. Alternate: __fastcall

String pooling

/Gf

Put duplicate strings in one memory location.

String pooling RO

/GF

Put duplicate strings in one read-only memory location.

Stack probes off

/Gs

Turn off stack checking. Alternate: #pragma check_stack

Func-level linking

/Gy

Linker only includes functions referenced in the OBJ rather than the entire contents

Small

/O1

Same as /Og /Oy /Ob1 /Gs /Gf /Gy /Os (global opts, omit frame ptr, allow inlines, stack probes off, func-level linking, favor code size over speed)

Fast

/O2

Same as /Og /Oy /Ob1 /Gs /Gf /Gy /Oi /Ot (global opts, omit frame ptr, allow inlines, stack probes off, func-level linking, favor code speed, intrinsic functions)

No aliasing

/Oa

Assume no aliasing occurs within functions. Alternate: #pragma optimize("a")

Intra-func aliasing

/Ow

Assume aliasing occurs across function calls. Alternate: #pragma optimize("w")

Disable all opts

/Od

Turn off all optimizations

Global opts

/Og

Turn on loop, common subexpression and register optimizations. Alternate: #pragma optimize("g")

Intrinsic functions

/Oi

Replace specific functions with inline versions (memcpy, strcpy, strlen, etc.). Alternate: #pragma intrinsic/function

Float consistency

/Op

Increase the precision of floating point operations at the expense of speed and size

Small code

/Os

Favor code size over speed. Alternate: #pragma optimize("s")

Fast code

/Ot

Favor code speed over size. Alternate: #pragma optimize("f")

Full optimizations

/Ox

Enable the following: /Ob1 /Og /Oi /Ot /Oy /Gs

Omit frame pointer

/Oy

Suppress creation of frame pointers on call stack. Frees the EBP register for other uses. Alternate: #pragma optimize("y")

Struct packing

/Zp8

Sets the structure member alignment. n=1,2,4,8(default),16. Smaller values generate smaller, slower code. Larger values generate larger, faster code.

5.2: Compiler C++ Language Settings

The following table lists all of the Microsoft Visual C++ 6.0 "C++" optimizations for reference. Alternate methods are given when an optimization can be specified directly in the code. Microsoft default values for release builds are highlighted.

Name

Option

Description

No Vtable

__declspec

(novtable)

Stops the compiler from generating code to initialize the vfptr in the constructor. Apply to pure interface classes for code size reduction.

No Throw

__declspec

(nothrow)

Stops the compiler from tracking unwindable objects. Apply to functions that don't throw exceptions for code size reduction. Same as using C++ throw() specification.

Disable RTTI

/GR-

Turn off run time type information.

Exception handling

/GX

Turn on exception handling.

Inline expansion

/Ob1

Allow functions marked inline to be inline. Alternate: inline, __forceinline, #pragma inline_depth/inline_recursion

Inline any

/Ob2

Inline functions deemed appropriate by compiler. Alternate: #pragma auto_inline/inline_depth/inline_recursion

Ctor displacement

/vd0

Disable constructor displacement. Choose this option only if no class constructors or destructors call virtual functions. Use /vd1 (default) to enable. Alternate: #pragma vtordisp

Best case ptrs

/vmb

Use best case "pointer to class member" representation. Use this option if you always define a class before you declare a pointer to a member of the class. The compiler will issue an error if it encounters a pointer declaration before the class is defined. Alternate: #pragma pointers_to_members

Gen. purpose ptrs

/vmg

Use general purpose "pointer to class member" representation (the opposite of /vmb). Required if you need to declare a pointer to a member of a class before defining the class. Requires one of the following inheritance models: /vmm, /vms, /vmv. Alternate: #pragma pointers_to_members

5.3: The Ultimate Compiler Settings

The ultimate options for fast programs. Microsoft default values for release builds highlighted.

Name

Option

Description

Disable Vtable Init

__declspec (novtable)

Stops compiler from generating code to initialize the vfptr in the constructor. Apply to pure interface classes.

No Throw

__declspec (nothrow)

Stops compiler from tracking unwindable objects. Apply to functions that don't throw exceptions. Recommend using the Std C exception specification throw() instead.

Pentium Pro

/G6

Optimize for PentiumPro and above (program might not run on Pentium)

Windows

/GA

Optimize for Windows

Fastcall

/Gr

Fastest calling convention

String pooling RO

/GF

Merge duplicate strings into one read-only memory location

Disable RTTI

/GR-

Turn off run time type information.

Stack probes off

/Gs

Turn off stack checking

Exception handling

/GX-

Turns off exception handling (assumes program isn't using excptn handling)

Func-level linking

/Gy

Only include functions that are referenced

Assume no aliasing

/Oa

Assume no aliasing occurs within functions

Inline any or inline expansion

/Ob2 or /Ob1

Inline any function deemed appropriate by compiler or turn inlining on. Alternates: inline, __forceinline

Global opts

/Og

Full loop, common subexpression and register optimizations

Intrinsic functions

/Oi

Replaces specific functions with inline versions (memcpy, strcpy, etc.)

Fast code

/Ot

Favor code speed over size (see notes below)

Omit frame pointer

/Oy

Omit frame pointer

Ctor displacement

/vd0

Disable constructor displacement.

Best case ptrs

/vmb

Use best case "pointer to class member" representation

Be aware that some of these options can cause your program to fail. See the section below on unsafe optimizations. There are also some optimizations that you might not choose to use for your specific application. For instance, if you're using RTTI or exception handling, don't turn those options off.

Optimizing for space can actually be faster than optimizing for speed because programs optimized for speed are almost always larger, and therefore more likely to cause additional paging than programs optimized for space. In fact, all Microsoft device drivers and Windows NT itself are built to minimize space. Try both ways and see which is faster for your app.

5.4: Disable VTable Initialization

The Microsoft-specific __declspec(novtable) option instructs the compiler not to initialize the virtual function pointer in the constructor of the given object. Normally, this would be a "bad thing." However, for abstract classes, there's no reason to initialize the pointer, because it will always be properly initialized when a concrete class derived from the object is constructed.

By the way, this option is misnamed. It sounds like it removes the vtable itself, which isn't at all true. The option should be called noinitvtable. Now consider the following example objects. Image is a typical abstract class. Frame is derived from Image. ImageNV and FrameNV are the same as Image and Frame respectively, except ImageNV uses the novtable option.


class Image
    {
    public:
        Image()
            {
            // push ebp
            // mov ebp,esp
            // push ecx
            // mov dword ptr [ebp-4],ecx
            // mov eax,dword ptr [this]
            // mov dword ptr [eax],offset Image::`vftable' (0040b0fc)
            }
        virtual ~Image() {} = 0;
    };

class Frame : public Image {};

#define NOINITVTABLE __declspec(novtable)
class NOINITVTABLE ImageNV
    {
    public:
        ImageNV()
            {
            // push ebp
            // mov ebp,esp
            // push ecx
            // mov dword ptr [ebp-4],ecx
            }
        virtual ~ImageNV() {} = 0;
    };

class FrameNV : public ImageNV {};

The ImageNV constructor has two fewer instructions, namely the instructions that initialize the virtual function table. The optimized constructor is 30% faster. Microsoft's own ATL class library uses this compiler option extensively.

 

5.5: Indicate Functions that Don't Throw Exceptions

The Microsoft-specific __declspec(nothrow) option instructs the compiler not to track unwindable objects as it normally would in case an exception is thrown and objects must be unwound on the stack.

A more portable method is to use the Standard C++ exception specification throw(). This indicates that the specified function will not throw an exception. Here are three example functions. MayThrow is a typical function. The compiler must assume that it could throw an exception. NoThrowMS is specified using nothrow. NoThrowStdC is specified using throw(). Calling NoThrowMS and NoThrowStdC are about 1% faster than calling MayThrow.


// compiler assumes this function could throw any exception
int MayThrow(int i) { return (i + 1); }

// MS compiler assumes function cannot throw any exceptions
int __declspec(nothrow) NoThrowMS(int i) { return (i + 1); }

// Any Std C++ compiler assumes function cannot throw any exceptions
int NoThrowStdC(int i) throw() { return (i + 1); }

 

5.6: Use the Fastcall Calling Convention

The Microsoft Visual C++ compiler supports the following function calling conventions: cdecl, stdcall and fastcall. Fastcall is roughly 2% faster than cdecl on a typical function call. Use fastcall. Your program will thank you.

 

5.7: Warning: Unsafe Optimizations Ahead

Don't change your optimization settings recklessly. Although most settings would never cause your program to crash, there are some settings that should be used only when you know your code is conforming to Microsoft's recommendations for that particular setting.

The following table lists all potentially risky optimizations.

Name

Option

Notes

Pentium

/G5

Code won't run on 486 or below (use /GB instead)

Pentium Pro

/G6

Code won't run on Pentium or below (use /GB instead)

String pooling

/Gf

If a string is modified, it will be modified for any variable that points to it

String pooling RO

/GF

If a string is modified, memory exception occurs

Stack probes off

/Gs

A stack overflow will crash the program without an overflow error

Exception handling

/GX-

If exception handling is not enabled, an exception may crash the program

Assume no aliasing

/Oa

If there is aliasing in the program, the optimization can cause corrupted data

Inline expansion

/Ob1

Inlines can cause unexpected code bloat and cache misses

Inline any

/Ob2

Inlines can cause unexpected code bloat and cache misses

Intrinsic functions

/Oi

Intrinsic functions increase code size

Float consistency

/Op

Resulting floating-point code will be larger and slower

Ctor displacement

/vd0

A virtual function may be passed an incorrect "this" pointer if it is invoked from within a constructor or destructor.

Struct packing

/Zpn

Can cause compatibility problems if packing is modified

Back to C++ Optimization Techniques