-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements for epsilon removal and determinisation #441
Performance improvements for epsilon removal and determinisation #441
Conversation
The description says "Return where an item would be, if it were inserted", but it was returning the last element <= rather than the first element >=, then the call to `state_set_cmpval` later was shifting i by 1 for that specific case. Handle it correctly inside the search function instead. Two other all call sites need to check whether the result refers to the append position (one past the end of the array) before checking `set->a[i] == state`, update them. Add a fast path upfront: It's VERY common to append states in order to the state array, so before we binary search each first compare against the last entry (unless empty).
In -O0 this can become pretty expensive (~25% of overall runtime for `time ./re -rpcre -C '^[ab]{0,2000}$'`), but when built with -O3 very little overhead remains. I'm adding this comment because every time I see this it seems to me like it should have `EXPENSIVE_CHECKS` around it, but profiling is telling me it really doesn't matter.
This is a major hotspot when doing epsilon removal over large runs of potentially skipped states (as might appear from `^[ab]{0,2000}$`). Add a fast path for appending, which is also very common. Extract the edge set destination search into its own function, `find_state_position`, and add a `#define` to switch between linear search, binary search, or calling both and comparing the result. I will remove linear search in the next commit, but am checking this in as an intermediate step for checking & benchmarking.
When I run `time ./re -rpcre -C '^[ab]{0,2000}$'` locally for -O3: - linear search: 2.991s - binary search: 1.521s
After the other changes in this PR, calls to qsort from `sort_and_dedup_dst_buf` are one of the largest remaining hotspots in the profile. We can often avoid calling qsort, though: - If there is <= 1 entry, just return, it's sorted. - Otherwise, first do a sweep through the array noting the min and max values. Unless there is a huge range between them, it's much faster to build a bitset from them in a small (max 10KB) stack-allocated array and then unpack the bitset (now sorted and unique). Only the needed portion of the array is initialized. I have not done a lot of experimentation to find a cutoff point where the bitset becomes slower than qsort (it may be much larger), I picked 10KB because it's likely to be safe to stack-allocate. I tried changing the bitset unpacking to use an 8 or 16 bit mask and jump forward faster through large sub-word ranges of 0 bits, but any improvement was lost among random variation, so I decided it wasn't worth the extra complexity. We already skip whole words that are 0.
Along with a couple unrelated USBan warnings (fixed above to reduce noise) CI found a bug -- |
If min and max are exactly 64 states apart the upper value was getting silently dropped due to an incorrect `words` value here. One of the patterns in the PCRE suite triggers this: ./re -rpcre '(?:c|d)(?:)(?:aaaaaaaa(?:)(?:bbbbbbbb)(?:bbbbbbbb(?:))(?:bbbbbbbb(?:)(?:bbbbbbbb)))' "caaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb" This should match, but did not.
Note: We can't use I tried calling out to |
* which includes the position immediately following the last entry. Return an enum | ||
* which indicates whether state is already present. */ | ||
static enum fsp_res | ||
find_state_position(const struct edge_set *set, fsm_state_t state, size_t *dst) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use this in edge_set_find
and edge_set_contains
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can binary search here because the struct edge_group
array is sorted by .to
and we're searching by destination state, but edge_set_find
and edge_set_contains
are searching by edge label. If they were frequently searched and it became a hotspot in the profile we could do linear-time reindexing and bsearch on those, but currently they don't even show up in the profile. As far as I can tell they're only even called from inside fsm_walk2
and the minimisation test oracle.
The bug I mentioned above (with |
This PR improves performance for a couple code paths in epsilon removal and determinisation that become hotspots with large
{}
repeated groups:state_set_search
would return the position after the array (appending), since this is very common.edge_set_add_bulk
.qsort
, when working space would be small (also very common) build a bitset on the stack instead.I did some other experimentation with reworking the algorithm for
epsilon_closure
to avoid populating the closure with intermediate states on the transitive path to the epsilon closure endpoints, but in the end it didn't have anywhere near as much impact as the other changes, and it makes the epsilon closure function less general-purpose, so it's probably not worth the extra complexity. For epsilon removal we only need to add states to the closures that either have labeled edges or are an end state. Filtering anything else (potentially including states being in their own epsilon closure) reduces the state set sizes for further processing. Otherwise they get filtered out in a later step.On my laptop, building with
-O3
, the difference betweenmain
and this PR'sHEAD
fortime ./re_main -rpcre -C '^[ab]{0,5000}$'
:main
:real 0m41.328s
HEAD
:real 0m5.167s
I chose that regex for benchmarking because it produces a very long chain of epsilon closures and a big edge set pileup.