Performance improvements for epsilon removal and determinisation #441

silentbicycle · 2023-08-30T16:43:16Z

This PR improves performance for a couple code paths in epsilon removal and determinisation that become hotspots with large {} repeated groups:

Add a fast path upfront for when state_set_search would return the position after the array (appending), since this is very common.
Switch from linear to binary searching in edge_set_add_bulk.
Avoid unnecessary calls to qsort, when working space would be small (also very common) build a bitset on the stack instead.

I did some other experimentation with reworking the algorithm for epsilon_closure to avoid populating the closure with intermediate states on the transitive path to the epsilon closure endpoints, but in the end it didn't have anywhere near as much impact as the other changes, and it makes the epsilon closure function less general-purpose, so it's probably not worth the extra complexity. For epsilon removal we only need to add states to the closures that either have labeled edges or are an end state. Filtering anything else (potentially including states being in their own epsilon closure) reduces the state set sizes for further processing. Otherwise they get filtered out in a later step.

On my laptop, building with -O3, the difference between main and this PR's HEAD for time ./re_main -rpcre -C '^[ab]{0,5000}$':

main: real 0m41.328s
HEAD: real 0m5.167s

I chose that regex for benchmarking because it produces a very long chain of epsilon closures and a big edge set pileup.

The description says "Return where an item would be, if it were inserted", but it was returning the last element <= rather than the first element >=, then the call to `state_set_cmpval` later was shifting i by 1 for that specific case. Handle it correctly inside the search function instead. Two other all call sites need to check whether the result refers to the append position (one past the end of the array) before checking `set->a[i] == state`, update them. Add a fast path upfront: It's VERY common to append states in order to the state array, so before we binary search each first compare against the last entry (unless empty).

In -O0 this can become pretty expensive (~25% of overall runtime for `time ./re -rpcre -C '^[ab]{0,2000}$'`), but when built with -O3 very little overhead remains. I'm adding this comment because every time I see this it seems to me like it should have `EXPENSIVE_CHECKS` around it, but profiling is telling me it really doesn't matter.

This is a major hotspot when doing epsilon removal over large runs of potentially skipped states (as might appear from `^[ab]{0,2000}$`). Add a fast path for appending, which is also very common. Extract the edge set destination search into its own function, `find_state_position`, and add a `#define` to switch between linear search, binary search, or calling both and comparing the result. I will remove linear search in the next commit, but am checking this in as an intermediate step for checking & benchmarking.

When I run `time ./re -rpcre -C '^[ab]{0,2000}$'` locally for -O3: - linear search: 2.991s - binary search: 1.521s

After the other changes in this PR, calls to qsort from `sort_and_dedup_dst_buf` are one of the largest remaining hotspots in the profile. We can often avoid calling qsort, though: - If there is <= 1 entry, just return, it's sorted. - Otherwise, first do a sweep through the array noting the min and max values. Unless there is a huge range between them, it's much faster to build a bitset from them in a small (max 10KB) stack-allocated array and then unpack the bitset (now sorted and unique). Only the needed portion of the array is initialized. I have not done a lot of experimentation to find a cutoff point where the bitset becomes slower than qsort (it may be much larger), I picked 10KB because it's likely to be safe to stack-allocate. I tried changing the bitset unpacking to use an 8 or 16 bit mask and jump forward faster through large sub-word ranges of 0 bits, but any improvement was lost among random variation, so I decided it wasn't worth the extra complexity. We already skip whole words that are 0.

silentbicycle · 2023-08-30T16:53:10Z

Along with a couple unrelated USBan warnings (fixed above to reduce noise) CI found a bug -- re -rpcre '(?:c|d)(?:)(?:aaaaaaaa(?:)(?:bbbbbbbb)(?:bbbbbbbb(?:))(?:bbbbbbbb(?:)(?:bbbbbbbb)))' should match "caaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb" but doesn't -- so I'm investigating that.

If min and max are exactly 64 states apart the upper value was getting silently dropped due to an incorrect `words` value here. One of the patterns in the PCRE suite triggers this: ./re -rpcre '(?:c|d)(?:)(?:aaaaaaaa(?:)(?:bbbbbbbb)(?:bbbbbbbb(?:))(?:bbbbbbbb(?:)(?:bbbbbbbb)))' "caaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb" This should match, but did not.

silentbicycle · 2023-08-31T14:13:42Z

Note: We can't use bsearch in state_set_search and edge_set_add_bulk because its interface doesn't match our use case -- bsearch returns NULL if the value isn't present, whereas we need the offset where the value would go if not present.

I tried calling out to bsearch anyway (and tracking time but ignoring the result) to compare timing, and bsearch took about 3.5x as long.

katef · 2023-09-04T15:03:55Z

src/adt/edgeset.c

+ * which includes the position immediately following the last entry. Return an enum
+ * which indicates whether state is already present. */
+static enum fsp_res
+find_state_position(const struct edge_set *set, fsm_state_t state, size_t *dst)


Should we use this in edge_set_find and edge_set_contains too?

We can binary search here because the struct edge_group array is sorted by .to and we're searching by destination state, but edge_set_find and edge_set_contains are searching by edge label. If they were frequently searched and it became a hotspot in the profile we could do linear-time reindexing and bsearch on those, but currently they don't even show up in the profile. As far as I can tell they're only even called from inside fsm_walk2 and the minimisation test oracle.

silentbicycle · 2023-09-11T16:31:10Z

The bug I mentioned above (with (?:c|d)(?:)(?:aaaaaaaa(?:)(?:bbbbbbbb)(?:bbbbbbbb(?:))(?:bbbbbbbb(?:)(?:bbbbbbbb)))) is fixed by the commit immediately after (a58cab2). At the time a comment seemed redundant, but noting that just in case.

silentbicycle added 9 commits August 29, 2023 16:52

stateset: Avoid memmove of size 0.

4bb505d

stateset: Comment struct fields.

351cea0

edgeset: Fix indentation for #if'd block.

be9bd88

edgeset: Commit to using binary search.

bca2488

When I run `time ./re -rpcre -C '^[ab]{0,2000}$'` locally for -O3: - linear search: 2.991s - binary search: 1.521s

edgeset: Remove stale comment.

70dc1b3

silentbicycle added the enhancement label Aug 30, 2023

silentbicycle requested a review from katef August 30, 2023 16:43

silentbicycle added 2 commits August 30, 2023 12:46

UBSan: Avoid implicit signed/unsigned conversion.

ae7c5b7

UBSan: Avoid implicit signed/unsigned conversion.

f623b5e

katef reviewed Sep 4, 2023

View reviewed changes

katef approved these changes Sep 11, 2023

View reviewed changes

katef merged commit 6eff0f9 into main Sep 11, 2023
322 checks passed

katef deleted the sv/performance-improvements-for-epsilon-removal-and-determinisation branch September 11, 2023 16:13

silentbicycle mentioned this pull request Sep 11, 2023

Sync performance improvements and misc. bugfixes from upstream fastly/libfsm#23

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements for epsilon removal and determinisation #441

Performance improvements for epsilon removal and determinisation #441

silentbicycle commented Aug 30, 2023 •

edited

Loading

silentbicycle commented Aug 30, 2023 •

edited

Loading

silentbicycle commented Aug 31, 2023 •

edited

Loading

katef Sep 4, 2023

silentbicycle Sep 11, 2023

silentbicycle commented Sep 11, 2023

Performance improvements for epsilon removal and determinisation #441

Performance improvements for epsilon removal and determinisation #441

Conversation

silentbicycle commented Aug 30, 2023 • edited Loading

silentbicycle commented Aug 30, 2023 • edited Loading

silentbicycle commented Aug 31, 2023 • edited Loading

katef Sep 4, 2023

Choose a reason for hiding this comment

silentbicycle Sep 11, 2023

Choose a reason for hiding this comment

silentbicycle commented Sep 11, 2023

silentbicycle commented Aug 30, 2023 •

edited

Loading

silentbicycle commented Aug 30, 2023 •

edited

Loading

silentbicycle commented Aug 31, 2023 •

edited

Loading