Andrei Burdulescu's personal website

Cpp Performance Methodology

write isolated benchmark; i.e. one executable benchmarks one thing
benchmark compiler flags:
- optimized(-O3)
- with debug info(-g)
- with frame pointer(-fno-omit-frame-pointer)
run benchmark and for each:
- output results as JSON
- record:
  - CPU profile(perf and/or gperftools)
  - stats with linux builtin commands(vmstat, mpstat, pidstat etc.) and/or BPF BCC tools(ext4slower, runqlen etc.)
  - perf stats(pmc), mostly IPC(instructions per cycle): perf stat -e instructions,cycles
- run them with cachegrind(more stable results, can be compared with cg_diff)
best to run them on a native machine(not a VM or container) used only for benchmarks with minimum required software(to avoid noisy neighbour problem)
compare results with last run(with gbenchdiff and cg_diff)
send email back to submitter with results report(or link where it can be downloaded):
- gbenchdiff and cg_diff
- cachegrind report
- linux-perf CPU profile(perf.data)
- gperftools CPU profile
- Linux/eBPF tools results
interpret results
- analyze gperftools CPU profile with pprof
- analyze perf CPU profile as flamegraph and with flamescope
- see OS/BCC/perf stats to understand better what's going on