Appendix A: STL Container Efficiency
|
Container |
Stores |
Overhead |
[ ] |
Iterators |
Insert |
Erase |
Find |
Sort |
|
list |
T |
8 |
n/a |
Bidirect'l |
C |
C |
N |
N log N |
|
deque |
T |
12 |
C |
Random |
C at begin or end; else N/2 |
C at begin or end; else N |
N |
N log N |
|
vector |
T |
0 |
C |
Random |
C at end; else N |
C at end; else N |
N |
N log N |
|
set |
T, Key |
12 |
n/a |
Bidirect'l |
log N |
log N |
log N |
C |
|
multiset |
T, Key |
12 |
n/a |
Bidirect'l |
log N |
d log (N+d) |
log N |
C |
|
map |
Pair, Key |
16 |
log N |
Bidirect'l |
log N |
log N |
log N |
C |
|
multimap |
Pair, Key |
16 |
n/a |
Bidirect'l |
log N |
d log (N+d) |
log N |
C |
|
stack |
T |
n/a |
n/a |
n/a |
C |
C |
n/a |
n/a |
|
queue |
T |
n/a |
n/a |
n/a |
C |
C |
n/a |
n/a |
|
priority_ queue |
T |
n/a |
n/a |
n/a |
log N |
log N |
n/a |
n/a |
|
slist |
T |
4 |
n/a |
Forward |
C |
C |
N |
N log N |
Appendix B: Relative Costs of Common Programming Operations
The following table was calculated on an Xbox (PIII-733). The nice thing about profiling on the Xbox is that an application is guaranteed to be running in kernel mode with no other tasks interfering with sampling. This table shows performance relative to a standard 32-bit assignment operation (value 1.00). Larger numbers indicate better performance. A value of 2.0 would be twice as fast as 32-bit assignment. A value of 0.5 would be twice as slow. Statistical error is +/-0.1
|
Relative Performance of Common Operations (PIII-733) |
|||||||||||||||
|
8-bit (char) |
16-bit (short) |
32-bit (long) |
64-bit (__int64) |
floating-point |
|||||||||||
|
Operation |
signed |
unsigned |
signed |
unsigned |
signed |
unsigned |
signed |
unsigned |
float |
double |
ldouble |
||||
|
a = b |
0.6 |
0.5 |
0.5 |
0.6 |
1.0 |
1.0 |
0.8 |
0.8 |
1.0 |
0.8 |
0.8 |
||||
|
a + b |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
0.9 |
0.8 |
1.0 |
1.0 |
0.6 |
||||
|
a - b |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
0.8 |
0.8 |
1.0 |
0.6 |
1.0 |
||||
|
-a |
1.0 |
1.0 |
1.2 |
1.0 |
1.0 |
1.2 |
0.9 |
0.9 |
1.0 |
1.0 |
0.6 |
||||
|
++a |
0.5 |
0.5 |
0.5 |
0.5 |
1.0 |
0.9 |
0.8 |
0.8 |
0.9 |
0.6 |
0.6 |
||||
|
--a |
0.5 |
0.6 |
0.5 |
0.5 |
1.0 |
1.0 |
0.8 |
0.8 |
0.9 |
0.3 |
0.6 |
||||
|
a * b |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
0.5 |
0.5 |
1.0 |
1.0 |
0.5 |
||||
|
a / b |
0.4 |
0.4 |
0.4 |
0.4 |
0.4 |
0.3 |
0.1 |
0.1 |
0.4 |
0.3 |
0.4 |
||||
|
a % b |
0.4 |
0.4 |
0.4 |
0.4 |
0.4 |
0.3 |
0.1 |
0.1 |
n/a |
n/a |
n/a |
||||
|
a == b |
0.6 |
0.6 |
0.6 |
0.6 |
0.6 |
0.6 |
0.5 |
0.5 |
0.5 |
0.5 |
0.3 |
||||
|
a != b |
0.6 |
0.6 |
0.6 |
0.6 |
0.6 |
0.6 |
0.5 |
0.5 |
0.5 |
0.4 |
0.5 |
||||
|
a > b |
0.6 |
0.6 |
0.6 |
0.6 |
0.6 |
0.6 |
0.5 |
0.5 |
0.5 |
0.5 |
0.4 |
||||
|
a < b |
0.6 |
0.6 |
0.6 |
0.6 |
0.6 |
0.6 |
0.5 |
0.5 |
0.5 |
0.4 |
0.5 |
||||
|
a >= b |
0.6 |
0.6 |
0.6 |
0.6 |
0.6 |
0.5 |
0.5 |
0.5 |
0.5 |
0.5 |
0.4 |
||||
|
a <= b |
0.6 |
0.6 |
0.6 |
0.6 |
0.6 |
0.6 |
0.5 |
0.5 |
0.5 |
0.4 |
0.5 |
||||
|
a && b |
0.6 |
0.6 |
0.6 |
0.6 |
0.6 |
0.6 |
0.5 |
0.5 |
n/a |
n/a |
n/a |
||||
|
a || b |
0.6 |
0.5 |
0.5 |
0.6 |
0.5 |
0.6 |
0.5 |
0.6 |
n/a |
n/a |
n/a |
||||
|
!a |
0.6 |
0.6 |
0.6 |
0.6 |
0.6 |
0.6 |
0.6 |
0.5 |
0.5 |
0.5 |
0.5 |
||||
|
a >> b |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
0.7 |
0.7 |
n/a |
n/a |
n/a |
||||
|
a << b |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
0.7 |
0.7 |
n/a |
n/a |
n/a |
||||
|
a & b |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
0.8 |
0.8 |
n/a |
n/a |
n/a |
||||
|
a | b |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
0.9 |
0.8 |
n/a |
n/a |
n/a |
||||
|
a ^ b |
1.2 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
0.8 |
0.8 |
n/a |
n/a |
n/a |
||||
|
~a |
1.0 |
1.0 |
1.0 |
1.0 |
1.2 |
1.0 |
0.9 |
0.9 |
n/a |
n/a |
n/a |
||||
Another thing to be aware of is the cost of converting between types. The baseline case is the "conversion" from a 32-bit signed int to another 32-bit signed int.
|
|
int32 to int32 |
int8 to int32 |
int32 to int8 |
double to int32 |
int32 to double |
int32 to int64 |
|
PII |
1.0 |
1.0 |
1.0 |
0.1 |
0.8 |
0.7 |
|
Pentium |
1.0 |
0.8 |
0.7 |
0.2 |
0.6 |
0.7 |
Most conversions are reasonable. But beware the conversion of floating-point to integer! It's five to ten times slower than the base case. Interestingly enough, it's also significantly slower on the Pentium-II compared to the Pentium.
Appendix C: Code Tuning and C++ Efficiency Resources
C.1: Profilers and Software tools
NuMega TrueTime (
www.numega.com) From the makers of BoundsChecker. Highly recommended. Accounts for context switches during program runs, so your results only include the time that the CPU spent executing your application. Also profiles Java code.Intel VTune (
developer.intel.com) Profile down to the assembly level. Includes a C++ code coach that recommends code changes that will improve performance.Rational VisualQuantify (
www.rational.com) From the makers of Purify. Similar to TrueTime.Metrowerks CodeWarrior Analysis Tools Suite (
www.metrowerks.com) Metrowerks acquired TracePoint's HiProf technology in the spring of 1998 and now uses the same technology in their new product CATS. Supports both CodeWarrior and Visual C++ 4.2 and up.PC-Lint (
www.gimpel.com) Detects parameters that could be passed by reference instead of by value, postfix increments that could be made prefix, and other efficiency bottlenecks, as well as a host of common C++ (and C) programming errors.C.2: Books
Effective C++ and More Effective C++, Scott Meyers (
www.aristeia.com) Superb tips from a C++ guru, from basic techniques to copy-on-write and multiple dispatch. The chapter on efficiency in More Effective C++ is a gem.Code Complete, Steve McConnell (
www.construx.com) General purpose code-tuning strategies, with examples and results.Graphics Programming Black Book Special Edition, Michael Abrash (
www.amazon.com) The bible on x86 assembly optimization, especially for 3D graphics.Graphics Gems I - V, Glassner, et al. (
www.graphicsgems.org) Efficient algorithms designed by graphics programmers and researchers.Inner Loops, Rick Booth (
ourworld.compuserve.com/homepages/rbooth) More x86 assembly optimization.High Performance Computing, 2nd Edition, Kevin Dowd (
www.oreilly.com/catalog/hpc2) A high-level look at optimization.C.3: Websites
SGI STL extensions (
www.sgi.com/tech/stl)SGI Singly-linked lists A non-standard list class. The SGI implementation has half the overhead and is considerably faster than the standard list class.
SGI Rope strings A highly efficient implementation of long strings. Useful for e-mail text, word processing, etc.
SGI Hashed containers Hashed containers are not included in the STL. These SGI hashed containers can be just what the doctor ordered.
Todd Veldhuizen articles (
extreme.indiana.edu/~tveldhui)Todd has been on the leading edge of the use of C++ templates for the sake of efficiency, especially Template Metaprogramming. He's also the lead programmer for Blitz++, a computing library that makes use of many of his techniques.
Nathan is a prolific writer on many C++ topics. Be sure to see his article on the Empty Member optimization.
Guru of the Week (
www.gotw.ca)Problems and solutions to many common C++ issues, as presented by one of the lead programmers of PeerDirect, Inc, including Avoiding Temporaries, Inline, Reference Counting and Copy On Write.
Object-Oriented System Development (
gee.cs.oswego.edu/dl/oosdw3/ch25.html) The Performance Optimization chapter from Dennis de Champeaux's book.High Performance C++ (
oscinfo.osc.edu/software/KAI/doc/UserGuide/chapter_3.html) A discussion of C++ and compiler optimizations from KAI, the leading vendor of optimizing C++ compilers for high-end systems (Cray, SPARC, SGI, etc.)NULLSTONE Compiler Optimization Categories (
www.nullstone.com/htmls/category.htm) A good list and description of C/C++ compiler optimization techniques.Performance Engineering: A Practical Approach to Performance Improvement (
www.rational.com) A discussion of profiling and bottlenecks by Rational Software Back to C++ Optimization Techniques