Make dynamic range facets value collection and sorting faster #13760

stefanvodita · 2024-09-11T10:59:41Z

Description

DynamicRangeUtil collects values from each segment and then sorts all the values in the main thread. I wonder if we could get a speed-up from collecting values in each segment, sorting them (maybe doing insertion sort), and then merging the sorted values from all segments, effectively moving more of the work to the executor and doing less in the main thread.

The text was updated successfully, but these errors were encountered:

timgrein · 2024-10-08T10:11:15Z

I can take a look at this one next week @stefanvodita, if you don't mind

stefanvodita · 2024-10-08T13:19:25Z

Please do! I'm happy to help with reviews or if you have questions about dynamic ranges.

timgrein · 2024-10-14T15:41:25Z

Cool, I've found some performance improvements (~10-15%), which can be reproduced through a new jmh benchmark I've added. I'll open a PR the next few days and tag you :)

stefanvodita · 2024-10-14T17:08:41Z

Another idea that @HoustonPutman had was to collect all the results, then use Quick Sort with the right pivots to determine the quantiles we care about without sorting the entire dataset. Curious what improvements you found @timgrein!

HoustonPutman · 2024-10-15T00:23:13Z

@timgrein , I've posted a PR for my idea that @stefanvodita mentioned. If you have the JMH benchmark, I'd love to test it out on mine as well.

josefschiefer27 · 2024-10-15T01:39:53Z

Another option could be to draw inspiration from the Learned Sort algorithm (refer to https://blog.acolyer.org/2020/10/19/the-case-for-a-learned-sorting-algorithm/ and https://learnedsystems.mit.edu/defeating-dups-learned-sort/), which demonstrates particularly fast sorting capabilities also for low-cardinality fields.

mikemccand · 2024-10-15T13:59:30Z

Learned Sort looks amazing -- @josefschiefer27 maybe open a dedicated spinoff issue to see if there are other places where it could help Lucene? Lucene does a lot of sorting ... e.g. sorting terms in the per-segment terms dictionary on flush.

stefanvodita · 2024-10-15T14:05:11Z

I think a more general issue for learned sort already exists: #12463

timgrein · 2024-10-17T08:23:00Z

@timgrein , I've posted a PR for my idea that @stefanvodita mentioned. If you have the JMH benchmark, I'd love to test it out on mine as well.

@HoustonPutman Sounds good, how would you prefer to test it? I can create a PR with the jmh benchmark and we can check, whether it's valuable in the sense of capturing your improvement correctly and then merge it indepedently.

I still have 1-2 ideas in mind I can contribute afterwards. These were more Java specific things and not on the algorithmic side. Speaking of that your change looks like an impressive improvement, very cool that this is also applicable to other parts of the codebase! :)

stefanvodita · 2024-10-23T13:59:01Z

@timgrein - publishing the benchmark as an independent PR sounds like a great idea!

stefanvodita added the type:enhancement label Sep 11, 2024

HoustonPutman linked a pull request Oct 15, 2024 that will close this issue

Use multi-select instead of a full sort for DynamicRange creation #13914

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make dynamic range facets value collection and sorting faster #13760

Make dynamic range facets value collection and sorting faster #13760

stefanvodita commented Sep 11, 2024

timgrein commented Oct 8, 2024

stefanvodita commented Oct 8, 2024

timgrein commented Oct 14, 2024

stefanvodita commented Oct 14, 2024

HoustonPutman commented Oct 15, 2024

josefschiefer27 commented Oct 15, 2024

mikemccand commented Oct 15, 2024

stefanvodita commented Oct 15, 2024

timgrein commented Oct 17, 2024

stefanvodita commented Oct 23, 2024

Make dynamic range facets value collection and sorting faster #13760

Make dynamic range facets value collection and sorting faster #13760

Comments

stefanvodita commented Sep 11, 2024

Description

timgrein commented Oct 8, 2024

stefanvodita commented Oct 8, 2024

timgrein commented Oct 14, 2024

stefanvodita commented Oct 14, 2024

HoustonPutman commented Oct 15, 2024

josefschiefer27 commented Oct 15, 2024

mikemccand commented Oct 15, 2024

stefanvodita commented Oct 15, 2024

timgrein commented Oct 17, 2024

stefanvodita commented Oct 23, 2024