forked from katef/libfsm
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync performance improvements and misc. bugfixes from upstream #23
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Re-adding it during each pass of the for loop led to a bunch of warnings during the bulid.
Previously these were split between src/libfsm/internal.h and include/common/check.h, but some of the internal.h definitions were used in libre. Also, add `BUILD_FOR_FUZZER`, which sets more restrictive limits for cases that lead to uninteresting failure modes that get reported over and over during fuzzing.
This is a step closer to removing: #include "libfsm/internal.h" /* XXX */ but ast_compile.c still needs it for the internal interface `fsm_mergeab`. If either that or `fsm_unionxy` is reworked to make a clearer library boundary, libre could stop depending on internal details of libfsm.
Otherwise, this can lead to uninitialized memory in combine_info when either a or b has zero states. Found via scan-build.
Found by scan-build.
The fuzzing itself is run unconditionally; the idea is the cache is restored each time, and then each run adds a little more. I'm also presenting the corpus as an asset so it can be grabbed to run fuzzing locally.
It's a little awkward to have CI bundle two things together for the way I've written it, so I've been iterating over them as a set. But that's taking way too long, so here I'm combining the two. I think this is semantically identical to -DNDEBUG; we'd never want to test with just assertions without these expensive checks, and we'd never want to run in production with them. But after some discussion we're keeping them separate for practical reasons. The main idea is that EXPENSIVE_CHECKS is supposed to be for stuff that doesn't change the algorithmic complexity of an operation. That means probably more stuff should move under EXPENSIVE_CHECKS rather than as assertions.
…an environment variable named `UBSAN_OPTIONS`
Here I'm also saving the seeds unconditionally on error, so we don't need to re-find the same seeds for each bug. Hopefully this should make fuzzing reproducible in CI.
The description says "Return where an item would be, if it were inserted", but it was returning the last element <= rather than the first element >=, then the call to `state_set_cmpval` later was shifting i by 1 for that specific case. Handle it correctly inside the search function instead. Two other all call sites need to check whether the result refers to the append position (one past the end of the array) before checking `set->a[i] == state`, update them. Add a fast path upfront: It's VERY common to append states in order to the state array, so before we binary search each first compare against the last entry (unless empty).
In -O0 this can become pretty expensive (~25% of overall runtime for `time ./re -rpcre -C '^[ab]{0,2000}$'`), but when built with -O3 very little overhead remains. I'm adding this comment because every time I see this it seems to me like it should have `EXPENSIVE_CHECKS` around it, but profiling is telling me it really doesn't matter.
This is a major hotspot when doing epsilon removal over large runs of potentially skipped states (as might appear from `^[ab]{0,2000}$`). Add a fast path for appending, which is also very common. Extract the edge set destination search into its own function, `find_state_position`, and add a `#define` to switch between linear search, binary search, or calling both and comparing the result. I will remove linear search in the next commit, but am checking this in as an intermediate step for checking & benchmarking.
When I run `time ./re -rpcre -C '^[ab]{0,2000}$'` locally for -O3: - linear search: 2.991s - binary search: 1.521s
After the other changes in this PR, calls to qsort from `sort_and_dedup_dst_buf` are one of the largest remaining hotspots in the profile. We can often avoid calling qsort, though: - If there is <= 1 entry, just return, it's sorted. - Otherwise, first do a sweep through the array noting the min and max values. Unless there is a huge range between them, it's much faster to build a bitset from them in a small (max 10KB) stack-allocated array and then unpack the bitset (now sorted and unique). Only the needed portion of the array is initialized. I have not done a lot of experimentation to find a cutoff point where the bitset becomes slower than qsort (it may be much larger), I picked 10KB because it's likely to be safe to stack-allocate. I tried changing the bitset unpacking to use an 8 or 16 bit mask and jump forward faster through large sub-word ranges of 0 bits, but any improvement was lost among random variation, so I decided it wasn't worth the extra complexity. We already skip whole words that are 0.
If min and max are exactly 64 states apart the upper value was getting silently dropped due to an incorrect `words` value here. One of the patterns in the PCRE suite triggers this: ./re -rpcre '(?:c|d)(?:)(?:aaaaaaaa(?:)(?:bbbbbbbb)(?:bbbbbbbb(?:))(?:bbbbbbbb(?:)(?:bbbbbbbb)))' "caaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb" This should match, but did not.
determinise: It's not possible to find a cached result in the hash table without allocating a to-set buffer first, so assert that it will be non-NULL. fsm_findmode: This should never be used on a state without edges. vm/v1.c and vm/v2.c: Free allocated return value on error.
katef
approved these changes
Sep 12, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This brings us up to date as of katef#443. (Edit: Changed from katef#441 to include katef#442 (a few more misc fixes) and katef#443 (address a CI issue).
This brings in several misc. fixes, but the primary motivation is the performance improvements in katef#441, which should help with excessive VCL compilation times that have been blocking increasing vcc's
max_regex_cohort_compilation
default from 5 to 6.