Sync performance improvements and misc. bugfixes from upstream #23

silentbicycle · 2023-09-11T18:12:24Z

This brings us up to date as of katef#443. (Edit: Changed from katef#441 to include katef#442 (a few more misc fixes) and katef#443 (address a CI issue).

This brings in several misc. fixes, but the primary motivation is the performance improvements in katef#441, which should help with excessive VCL compilation times that have been blocking increasing vcc's max_regex_cohort_compilation default from 5 to 6.

Re-adding it during each pass of the for loop led to a bunch of warnings during the bulid.

Previously these were split between src/libfsm/internal.h and include/common/check.h, but some of the internal.h definitions were used in libre. Also, add `BUILD_FOR_FUZZER`, which sets more restrictive limits for cases that lead to uninteresting failure modes that get reported over and over during fuzzing.

This is a step closer to removing: #include "libfsm/internal.h" /* XXX */ but ast_compile.c still needs it for the internal interface `fsm_mergeab`. If either that or `fsm_unionxy` is reworked to make a clearer library boundary, libre could stop depending on internal details of libfsm.

Otherwise, this can lead to uninitialized memory in combine_info when either a or b has zero states. Found via scan-build.

Found by scan-build.

…dling.

The fuzzing itself is run unconditionally; the idea is the cache is restored each time, and then each run adds a little more. I'm also presenting the corpus as an asset so it can be grabbed to run fuzzing locally.

It's a little awkward to have CI bundle two things together for the way I've written it, so I've been iterating over them as a set. But that's taking way too long, so here I'm combining the two. I think this is semantically identical to -DNDEBUG; we'd never want to test with just assertions without these expensive checks, and we'd never want to run in production with them. But after some discussion we're keeping them separate for practical reasons. The main idea is that EXPENSIVE_CHECKS is supposed to be for stuff that doesn't change the algorithmic complexity of an operation. That means probably more stuff should move under EXPENSIVE_CHECKS rather than as assertions.

…an environment variable named `UBSAN_OPTIONS`

Here I'm also saving the seeds unconditionally on error, so we don't need to re-find the same seeds for each bug. Hopefully this should make fuzzing reproducible in CI.

https://static.rust-lang.org/doc/master/reference.html#literals

The description says "Return where an item would be, if it were inserted", but it was returning the last element <= rather than the first element >=, then the call to `state_set_cmpval` later was shifting i by 1 for that specific case. Handle it correctly inside the search function instead. Two other all call sites need to check whether the result refers to the append position (one past the end of the array) before checking `set->a[i] == state`, update them. Add a fast path upfront: It's VERY common to append states in order to the state array, so before we binary search each first compare against the last entry (unless empty).

In -O0 this can become pretty expensive (~25% of overall runtime for `time ./re -rpcre -C '^[ab]{0,2000}$'`), but when built with -O3 very little overhead remains. I'm adding this comment because every time I see this it seems to me like it should have `EXPENSIVE_CHECKS` around it, but profiling is telling me it really doesn't matter.

This is a major hotspot when doing epsilon removal over large runs of potentially skipped states (as might appear from `^[ab]{0,2000}$`). Add a fast path for appending, which is also very common. Extract the edge set destination search into its own function, `find_state_position`, and add a `#define` to switch between linear search, binary search, or calling both and comparing the result. I will remove linear search in the next commit, but am checking this in as an intermediate step for checking & benchmarking.

When I run `time ./re -rpcre -C '^[ab]{0,2000}$'` locally for -O3: - linear search: 2.991s - binary search: 1.521s

After the other changes in this PR, calls to qsort from `sort_and_dedup_dst_buf` are one of the largest remaining hotspots in the profile. We can often avoid calling qsort, though: - If there is <= 1 entry, just return, it's sorted. - Otherwise, first do a sweep through the array noting the min and max values. Unless there is a huge range between them, it's much faster to build a bitset from them in a small (max 10KB) stack-allocated array and then unpack the bitset (now sorted and unique). Only the needed portion of the array is initialized. I have not done a lot of experimentation to find a cutoff point where the bitset becomes slower than qsort (it may be much larger), I picked 10KB because it's likely to be safe to stack-allocate. I tried changing the bitset unpacking to use an 8 or 16 bit mask and jump forward faster through large sub-word ranges of 0 bits, but any improvement was lost among random variation, so I decided it wasn't worth the extra complexity. We already skip whole words that are 0.

If min and max are exactly 64 states apart the upper value was getting silently dropped due to an incorrect `words` value here. One of the patterns in the PCRE suite triggers this: ./re -rpcre '(?:c|d)(?:)(?:aaaaaaaa(?:)(?:bbbbbbbb)(?:bbbbbbbb(?:))(?:bbbbbbbb(?:)(?:bbbbbbbb)))' "caaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb" This should match, but did not.

determinise: It's not possible to find a cached result in the hash table without allocating a to-set buffer first, so assert that it will be non-NULL. fsm_findmode: This should never be used on a state without edges. vm/v1.c and vm/v2.c: Free allocated return value on error.

silentbicycle and others added 30 commits June 12, 2023 15:12

tests/endids/Makefile: Move SRC definition out of the loop.

8a704d7

Re-adding it during each pass of the for loop led to a bunch of warnings during the bulid.

Update generated parser code, due to BUILD_FOR_FUZZER change.

8a9298e

bugfix: Ensure combine_info is zeroed during early return.

fb5b74d

Otherwise, this can lead to uninitialized memory in combine_info when either a or b has zero states. Found via scan-build.

bugfix: Eliminate possible write to null pointer.

5148314

Found by scan-build.

Missing API comments.

656b32f

Another pass at ABI fixes for rust, and introduce rust for CI.

e6c11fb

Missing assertion (gcc complains about this but not clang.

64a4bc1

Assert on errno == 0 just to catch any pre-existing missing error han…

dc0c728

…dling.

Missing argument list for __asan_default_options()

b066001

Update kmkf for -fsanitize=fuzzer-no-link

d759723

Run fuzzing in CI.

2dcca17

abort() on unreachable codepaths for -DNDEBUG

d5c6132

Missing header.

d0da06a

Add an explicit "default" mode for fuzzing.

6166469

Ignore timeouts when fuzzing under EXPENSIVE_CHECKS.

b8212f9

Cache seeds between CI runs.

e2f5431

The fuzzing itself is run unconditionally; the idea is the cache is restored each time, and then each run adds a little more. I'm also presenting the corpus as an asset so it can be grabbed to run fuzzing locally.

-fork for fuzzing, this ignores OOMs by default.

af8f8fe

Apparently ASAN_OPTIONS= and UBSAN_OPTIONS= do actually both go into …

89b3e01

…an environment variable named `UBSAN_OPTIONS`

Update kmkf for DWARF-4 workaround.

784d98c

Two attempts to deal with timeouts when fuzzing.

d939bfa

Retry on error, workaround for kmkf#14

cf4eae6

Combine ASAN and UBSAN just to save some time.

0ea03dc

No need to rebuild the entire repo just to test makefiles.

4ee299d

A cheesy attempt at parallelisation.

c732e00

No need to strdup here.

dd98636

Just DEBUG for tests and fuzzing, not EXPENSIVE_CHECKS.

f216cb4

Explicit cache save/restore actions for seeds.

b5d710e

Here I'm also saving the seeds unconditionally on error, so we don't need to re-find the same seeds for each bug. Hopefully this should make fuzzing reproducible in CI.

katef and others added 15 commits June 15, 2023 13:40

None of these are valid rust byte escapes.

4e5bcce

https://static.rust-lang.org/doc/master/reference.html#literals

Only mut when we modify a variable.

7a43207

Mark unused input when there are no fetch instructions.

479d103

stateset: Avoid memmove of size 0.

709b8cc

stateset: Comment struct fields.

cbfeddd

edgeset: Fix indentation for #if'd block.

c3dab77

edgeset: Commit to using binary search.

7122d2f

When I run `time ./re -rpcre -C '^[ab]{0,2000}$'` locally for -O3: - linear search: 2.991s - binary search: 1.521s

edgeset: Remove stale comment.

30e34ef

UBSan: Avoid implicit signed/unsigned conversion.

cf6051f

UBSan: Avoid implicit signed/unsigned conversion.

c1e1282

silentbicycle requested a review from katef September 11, 2023 18:12

silentbicycle and others added 6 commits September 11, 2023 19:14

Add src/adt/idmap.c, a state -> ID set map.

51892e3

Remove theft test harness for deleted ADT (ipriq).

7c6644f

Add pcre-anchor test for anchoring edge case.

c646868

fuzz/run_fuzzer: Run single seed file when given as argument.

0789d61

Don't purge the seed cache for PRs syncing clones.

1ca3726

katef approved these changes Sep 12, 2023

View reviewed changes

silentbicycle merged commit 11d725e into fastly:main Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync performance improvements and misc. bugfixes from upstream #23

Sync performance improvements and misc. bugfixes from upstream #23

silentbicycle commented Sep 11, 2023 •

edited

Loading

Sync performance improvements and misc. bugfixes from upstream #23

Sync performance improvements and misc. bugfixes from upstream #23

Conversation

silentbicycle commented Sep 11, 2023 • edited Loading

silentbicycle commented Sep 11, 2023 •

edited

Loading