-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make dynamic range facets value collection and sorting faster #13760
Comments
I can take a look at this one next week @stefanvodita, if you don't mind |
Please do! I'm happy to help with reviews or if you have questions about dynamic ranges. |
Cool, I've found some performance improvements (~10-15%), which can be reproduced through a new |
Another idea that @HoustonPutman had was to collect all the results, then use Quick Sort with the right pivots to determine the quantiles we care about without sorting the entire dataset. Curious what improvements you found @timgrein! |
@timgrein , I've posted a PR for my idea that @stefanvodita mentioned. If you have the JMH benchmark, I'd love to test it out on mine as well. |
Another option could be to draw inspiration from the Learned Sort algorithm (refer to https://blog.acolyer.org/2020/10/19/the-case-for-a-learned-sorting-algorithm/ and https://learnedsystems.mit.edu/defeating-dups-learned-sort/), which demonstrates particularly fast sorting capabilities also for low-cardinality fields. |
Learned Sort looks amazing -- @josefschiefer27 maybe open a dedicated spinoff issue to see if there are other places where it could help Lucene? Lucene does a lot of sorting ... e.g. sorting terms in the per-segment terms dictionary on flush. |
I think a more general issue for learned sort already exists: #12463 |
@HoustonPutman Sounds good, how would you prefer to test it? I can create a PR with the jmh benchmark and we can check, whether it's valuable in the sense of capturing your improvement correctly and then merge it indepedently. I still have 1-2 ideas in mind I can contribute afterwards. These were more Java specific things and not on the algorithmic side. Speaking of that your change looks like an impressive improvement, very cool that this is also applicable to other parts of the codebase! :) |
@timgrein - publishing the benchmark as an independent PR sounds like a great idea! |
Description
DynamicRangeUtil
collects values from each segment and then sorts all the values in the main thread. I wonder if we could get a speed-up from collecting values in each segment, sorting them (maybe doing insertion sort), and then merging the sorted values from all segments, effectively moving more of the work to the executor and doing less in the main thread.The text was updated successfully, but these errors were encountered: