Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make dynamic range facets value collection and sorting faster #13760

Open
stefanvodita opened this issue Sep 11, 2024 · 10 comments · May be fixed by #13914
Open

Make dynamic range facets value collection and sorting faster #13760

stefanvodita opened this issue Sep 11, 2024 · 10 comments · May be fixed by #13914

Comments

@stefanvodita
Copy link
Contributor

Description

DynamicRangeUtil collects values from each segment and then sorts all the values in the main thread. I wonder if we could get a speed-up from collecting values in each segment, sorting them (maybe doing insertion sort), and then merging the sorted values from all segments, effectively moving more of the work to the executor and doing less in the main thread.

@timgrein
Copy link
Contributor

timgrein commented Oct 8, 2024

I can take a look at this one next week @stefanvodita, if you don't mind

@stefanvodita
Copy link
Contributor Author

Please do! I'm happy to help with reviews or if you have questions about dynamic ranges.

@timgrein
Copy link
Contributor

Cool, I've found some performance improvements (~10-15%), which can be reproduced through a new jmh benchmark I've added. I'll open a PR the next few days and tag you :)

@stefanvodita
Copy link
Contributor Author

Another idea that @HoustonPutman had was to collect all the results, then use Quick Sort with the right pivots to determine the quantiles we care about without sorting the entire dataset. Curious what improvements you found @timgrein!

@HoustonPutman
Copy link
Contributor

@timgrein , I've posted a PR for my idea that @stefanvodita mentioned. If you have the JMH benchmark, I'd love to test it out on mine as well.

@josefschiefer27
Copy link

Another option could be to draw inspiration from the Learned Sort algorithm (refer to https://blog.acolyer.org/2020/10/19/the-case-for-a-learned-sorting-algorithm/ and https://learnedsystems.mit.edu/defeating-dups-learned-sort/), which demonstrates particularly fast sorting capabilities also for low-cardinality fields.

@mikemccand
Copy link
Member

Learned Sort looks amazing -- @josefschiefer27 maybe open a dedicated spinoff issue to see if there are other places where it could help Lucene? Lucene does a lot of sorting ... e.g. sorting terms in the per-segment terms dictionary on flush.

@stefanvodita
Copy link
Contributor Author

I think a more general issue for learned sort already exists: #12463

@timgrein
Copy link
Contributor

@timgrein , I've posted a PR for my idea that @stefanvodita mentioned. If you have the JMH benchmark, I'd love to test it out on mine as well.

@HoustonPutman Sounds good, how would you prefer to test it? I can create a PR with the jmh benchmark and we can check, whether it's valuable in the sense of capturing your improvement correctly and then merge it indepedently.

I still have 1-2 ideas in mind I can contribute afterwards. These were more Java specific things and not on the algorithmic side. Speaking of that your change looks like an impressive improvement, very cool that this is also applicable to other parts of the codebase! :)

@stefanvodita
Copy link
Contributor Author

@timgrein - publishing the benchmark as an independent PR sounds like a great idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants