Skip to content

Commit

Permalink
perf: try compiling with -Ofast instead of -O3
Browse files Browse the repository at this point in the history
Some risk involved, but 🤷. From `clang` manpage:

       -O0, -O1, -O2, -O3, -Ofast, -Os, -Oz, -Og, -O, -O4
              Specify which optimization level to use:
                 -O0 Means "no optimization": this level compiles the
                 fastest and generates the most debuggable code.

                 -O1 Somewhere between -O0 and -O2.

                 -O2 Moderate level of optimization which enables most
                 optimizations.

                 -O3 Like -O2, except that it enables optimizations that
                 take longer to perform or that may generate larger code
                 (in an attempt to make the program run faster).

                 -Ofast Enables all the optimizations from -O3 along
                 with other aggressive optimizations that may violate
                 strict compliance with language standards.

                 -Os Like -O2 with extra optimizations to reduce code
                 size.

From `gcc` manpage (which I don't actually have installed on this
machine, so grabbing a snippet from online instead):

       -O
       -O1 Optimize.  Optimizing compilation takes somewhat more time,
           and a lot more memory for a large function.

           With -O, the compiler tries to reduce code size and execution
           time, without performing any optimizations that take a great
           deal of compilation time.

           -O turns on the following optimization flags:

           -fauto-inc-dec -fbranch-count-reg -fcombine-stack-adjustments
           -fcompare-elim -fcprop-registers -fdce -fdefer-pop
           -fdelayed-branch -fdse -fforward-propagate
           -fguess-branch-probability -fif-conversion -fif-conversion2
           -finline-functions-called-once -fipa-profile -fipa-pure-const
           -fipa-reference -fipa-reference-addressable -fmerge-constants
           -fmove-loop-invariants -fomit-frame-pointer -freorder-blocks
           -fshrink-wrap -fshrink-wrap-separate -fsplit-wide-types
           -fssa-backprop -fssa-phiopt -ftree-bit-ccp -ftree-ccp
           -ftree-ch -ftree-coalesce-vars -ftree-copy-prop -ftree-dce
           -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre
           -ftree-phiprop -ftree-pta -ftree-scev-cprop -ftree-sink
           -ftree-slsr -ftree-sra -ftree-ter -funit-at-a-time

       -O2 Optimize even more.  GCC performs nearly all supported
           optimizations that do not involve a space-speed tradeoff.  As
           compared to -O, this option increases both compilation time
           and the performance of the generated code.

           -O2 turns on all optimization flags specified by -O.  It also
           turns on the following optimization flags:

           -falign-functions  -falign-jumps -falign-labels
           -falign-loops -fcaller-saves -fcode-hoisting -fcrossjumping
           -fcse-follow-jumps  -fcse-skip-blocks
           -fdelete-null-pointer-checks -fdevirtualize
           -fdevirtualize-speculatively -fexpensive-optimizations -fgcse
           -fgcse-lm -fhoist-adjacent-loads -finline-small-functions
           -findirect-inlining -fipa-bit-cp  -fipa-cp  -fipa-icf
           -fipa-ra  -fipa-sra  -fipa-vrp
           -fisolate-erroneous-paths-dereference -flra-remat
           -foptimize-sibling-calls -foptimize-strlen -fpartial-inlining
           -fpeephole2 -freorder-blocks-algorithm=stc
           -freorder-blocks-and-partition  -freorder-functions
           -frerun-cse-after-loop -fschedule-insns  -fschedule-insns2
           -fsched-interblock  -fsched-spec -fstore-merging
           -fstrict-aliasing -fthread-jumps -ftree-builtin-call-dce
           -ftree-pre -ftree-switch-conversion  -ftree-tail-merge
           -ftree-vrp

           Please note the warning under -fgcse about invoking -O2 on
           programs that use computed gotos.

       -O3 Optimize yet more.  -O3 turns on all optimizations specified
           by -O2 and also turns on the following optimization flags:

           -fgcse-after-reload -finline-functions -fipa-cp-clone
           -floop-interchange -floop-unroll-and-jam -fpeel-loops
           -fpredictive-commoning -fsplit-paths
           -ftree-loop-distribute-patterns -ftree-loop-distribution
           -ftree-loop-vectorize -ftree-partial-pre -ftree-slp-vectorize
           -funswitch-loops -fvect-cost-model
           -fversion-loops-for-strides

       -O0 Reduce compilation time and make debugging produce the
           expected results.  This is the default.

       -Os Optimize for size.  -Os enables all -O2 optimizations except
           those that often increase code size:

           -falign-functions  -falign-jumps -falign-labels
           -falign-loops -fprefetch-loop-arrays
           -freorder-blocks-algorithm=stc

           It also enables -finline-functions, causes the compiler to
           tune for code size rather than execution speed, and performs
           further optimizations designed to reduce code size.

       -Ofast
           Disregard strict standards compliance.  -Ofast enables all
           -O3 optimizations.  It also enables optimizations that are
           not valid for all standard-compliant programs.  It turns on
           -ffast-math and the Fortran-specific -fstack-arrays, unless
           -fmax-stack-var-size is specified, and -fno-protect-parens.

Result:

    Summary of cpu time and (wall time):

                        best    avg      sd     +/-      p     (best)    (avg)      (sd)     +/-      p
         pathological 0.20632 0.21660 0.06898 [+0.0%]        (0.20632) (0.21660) (0.06898) [+0.0%]
            command-t 0.16686 0.17429 0.05953 [+0.6%]        (0.16686) (0.17430) (0.05954) [+0.6%]
    chromium (subset) 1.32075 1.33852 0.03378 [-0.2%]        (0.27715) (0.28639) (0.01831) [-2.0%] 0.0005
     chromium (whole) 1.10602 1.11082 0.01093 [-0.6%] 0.0005 (0.12251) (0.12594) (0.01112) [+0.1%]
           big (400k) 1.66209 1.66965 0.02413 [-0.6%] 0.0005 (0.17984) (0.18305) (0.00927) [-0.3%]
                total 4.47450 4.50988 0.11781 [-0.4%] 0.0005 (0.96369) (0.98629) (0.10814) [-0.5%]  0.005

Those are the Lua benchmarks, of course. Ruby unaffected (and it is
using `-Os` anyway.
  • Loading branch information
wincent committed Aug 13, 2024
1 parent 1045ca3 commit 7e6f158
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions lua/wincent/commandt/lib/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,7 @@ ifdef DEBUG
CCFLAGS += -DDEBUG -g -O0
else
# As per `man 3 assert`, defining `NDEBUG` elides all `assert()` macros.
# May also want to consider going `-Ofast` instead of `-O3`.
CCFLAGS += -DNDEBUG -O3
CCFLAGS += -DNDEBUG -Ofast
endif

ifeq ($(OS),Windows_NT)
Expand Down

0 comments on commit 7e6f158

Please sign in to comment.