Remove HitsThresholdChecker. #13943

jpountz · 2024-10-22T16:41:11Z

Our top-docs collectors have a dependency on HitsThresholdChecker, which is essentially a shared counter that is incremented until it reaches the total hits threshold, when the scorer can start dynamically pruning hits.

A consequence of this removal is that dynamic pruning may start later, as soon as:

either the current slice collected totalHitsThreshold hits,
or another slice collected totalHitsThreshold hits and the current slice collected enough hits (up to 1,024) to check the shared MaxScoreAccumulator.

So in short, it exchanges a bit more work globally in favor of a bit less contention. A longer-term goal of mine is to stop specializing our CollectorManagers based on whether they are going to be used concurrently or not.

`TopScoreDocCollectorManager` has a dependency on `HitsThresholdChecker`, which is essentially a shared counter that is incremented until it reaches the total hits threshold, when the scorer can start dynamically pruning hits. A consequence of this removal is that dynamic pruning may start later, as soon as: - either the current slice collected `totalHitsThreshold` hits, - or another slice collected `totalHitsThreshold` hits and the current slice collected enough hits (up to 1,024) to check the shared `MaxScoreAccumulator`. So in short, it exchanges a bit more work globally in favor of a bit less contention. A longer-term goal of mine is to stop specializing our `CollectorManager`s based on whether they are going to be used concurrently or not.

jpountz · 2024-10-22T16:41:57Z

EDIT: at the time of this comment, only TopScoreDocCollector no longer used HitsThresholdChecker.

wikibigall with a searchConcurrency of 8 suggests that the slowdown is tiny:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      AndHighLow      849.71      (3.3%)      826.18      (2.2%)   -2.8% (  -7% -    2%) 0.002
           HighTermDayOfYearSort      253.62      (3.2%)      246.72      (3.0%)   -2.7% (  -8% -    3%) 0.005
                      TermDTSort      202.82      (3.4%)      198.21      (4.4%)   -2.3% (  -9% -    5%) 0.069
            HighTermTitleBDVSort       49.87      (5.1%)       49.01      (5.5%)   -1.7% ( -11% -    9%) 0.306
                      OrHighRare      245.78      (8.9%)      242.01      (9.2%)   -1.5% ( -17% -   18%) 0.591
             And2Terms2StopWords      184.35      (5.3%)      181.84      (4.5%)   -1.4% ( -10% -    8%) 0.379
                     AndHighHigh      102.63      (6.9%)      101.25      (5.9%)   -1.3% ( -13% -   12%) 0.507
                       CountTerm     8520.99      (3.9%)     8417.05      (5.1%)   -1.2% (  -9% -    8%) 0.396
                        Wildcard      112.90      (5.6%)      111.56      (4.8%)   -1.2% ( -10% -    9%) 0.471
                          Fuzzy1       79.74      (1.7%)       79.07      (1.7%)   -0.8% (  -4% -    2%) 0.114
                       OrHighLow      724.26      (2.3%)      719.70      (2.2%)   -0.6% (  -5% -    3%) 0.377
              Or2Terms2StopWords      192.42      (5.4%)      191.24      (4.0%)   -0.6% (  -9% -    9%) 0.680
                       And3Terms      176.74      (5.3%)      175.76      (4.1%)   -0.6% (  -9% -    9%) 0.712
                 CountOrHighHigh       88.76      (4.9%)       88.34      (4.4%)   -0.5% (  -9% -    9%) 0.744
               HighTermMonthSort     1066.76      (1.6%)     1062.79      (1.9%)   -0.4% (  -3% -    3%) 0.506
                          Fuzzy2       75.06      (1.4%)       74.85      (1.8%)   -0.3% (  -3% -    3%) 0.597
                  CountOrHighMed      137.67      (5.3%)      137.45      (4.4%)   -0.2% (  -9% -   10%) 0.920
                   OrHighNotHigh      196.77      (3.3%)      196.66      (3.5%)   -0.1% (  -6% -    6%) 0.959
               HighTermTitleSort       70.08      (6.6%)       70.09      (5.7%)    0.0% ( -11% -   13%) 0.994
                      OrHighHigh       94.07      (4.8%)       94.12      (5.1%)    0.0% (  -9% -   10%) 0.975
                      AndHighMed      182.18      (4.1%)      182.43      (3.4%)    0.1% (  -7% -    8%) 0.909
                    OrNotHighMed      255.11      (2.8%)      255.50      (3.3%)    0.2% (  -5% -    6%) 0.874
                       OrHighMed      242.11      (2.4%)      242.65      (2.5%)    0.2% (  -4% -    5%) 0.772
                   OrNotHighHigh      235.61      (2.3%)      236.26      (3.4%)    0.3% (  -5% -    6%) 0.766
                        HighTerm      361.55      (2.6%)      362.84      (2.7%)    0.4% (  -4% -    5%) 0.669
                         MedTerm      453.24      (2.8%)      455.07      (2.5%)    0.4% (  -4% -    5%) 0.628
                    OrHighNotMed      317.00      (3.0%)      318.40      (3.8%)    0.4% (  -6% -    7%) 0.680
                        PKLookup      277.48      (2.2%)      278.76      (2.7%)    0.5% (  -4% -    5%) 0.558
                          OrMany       46.17      (2.3%)       46.41      (2.9%)    0.5% (  -4% -    5%) 0.520
                         Prefix3       68.55      (4.0%)       69.01      (4.7%)    0.7% (  -7% -    9%) 0.627
                    OrHighNotLow      336.53      (3.2%)      339.73      (3.9%)    1.0% (  -5% -    8%) 0.395
                    AndStopWords       64.81      (5.3%)       65.46      (5.3%)    1.0% (  -9% -   12%) 0.543
                         LowTerm      640.08      (3.1%)      647.88      (2.6%)    1.2% (  -4% -    7%) 0.176
                CountAndHighHigh       74.37      (5.2%)       75.37      (5.6%)    1.3% (  -8% -   12%) 0.426
                 CountAndHighMed      161.25      (5.1%)      163.59      (5.7%)    1.4% (  -8% -   12%) 0.394
                    OrNotHighLow      865.87      (3.3%)      880.11      (2.7%)    1.6% (  -4% -    7%) 0.081
                        Or3Terms      175.34      (4.3%)      178.28      (4.9%)    1.7% (  -7% -   11%) 0.252
                     OrStopWords       69.26      (6.5%)       70.71      (6.4%)    2.1% ( -10% -   15%) 0.303
                          IntNRQ      166.81      (5.5%)      170.67     (10.5%)    2.3% ( -12% -   19%) 0.381

jpountz · 2024-10-23T13:43:17Z

Actually, while I was at it, I also removed TopFieldCollector's dependency on HitsThresholdChecker, and then removed HitsThresholdChecker.

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
             And2Terms2StopWords      189.12      (4.5%)      182.81      (4.2%)   -3.3% ( -11% -    5%) 0.015
                    AndStopWords       67.12      (4.6%)       66.00      (3.8%)   -1.7% (  -9% -    7%) 0.214
                     OrStopWords       70.79      (6.9%)       69.97      (7.2%)   -1.1% ( -14% -   13%) 0.607
                       CountTerm     8165.89      (5.6%)     8090.18      (3.8%)   -0.9% (  -9% -    8%) 0.538
                        Wildcard      133.39      (4.1%)      132.24      (3.3%)   -0.9% (  -7% -    6%) 0.468
              Or2Terms2StopWords      194.66      (5.4%)      193.42      (4.1%)   -0.6% (  -9% -    9%) 0.673
                       OrHighMed      198.12      (4.1%)      197.08      (3.5%)   -0.5% (  -7% -    7%) 0.663
                       And3Terms      180.61      (4.0%)      179.74      (3.9%)   -0.5% (  -8% -    7%) 0.700
                 CountAndHighMed      163.62      (4.5%)      163.28      (3.5%)   -0.2% (  -7% -    8%) 0.872
                          Fuzzy1       78.71      (2.3%)       78.63      (1.8%)   -0.1% (  -4% -    4%) 0.881
                      AndHighMed      160.23      (4.9%)      160.16      (4.4%)   -0.0% (  -8% -    9%) 0.974
                      OrHighHigh      103.71      (4.1%)      103.68      (4.2%)   -0.0% (  -8% -    8%) 0.981
                          OrMany       46.85      (2.4%)       46.85      (2.7%)   -0.0% (  -5% -    5%) 0.996
                  CountOrHighMed      139.92      (4.2%)      139.96      (4.7%)    0.0% (  -8% -    9%) 0.984
                        PKLookup      278.34      (2.4%)      278.47      (2.5%)    0.0% (  -4% -    5%) 0.950
                        Or3Terms      176.21      (3.8%)      176.43      (2.9%)    0.1% (  -6% -    7%) 0.904
                          Fuzzy2       74.40      (2.5%)       74.65      (2.0%)    0.3% (  -4% -    4%) 0.647
                CountAndHighHigh       76.54      (4.8%)       76.82      (4.1%)    0.4% (  -8% -    9%) 0.798
                     AndHighHigh       96.45      (5.3%)       96.82      (4.3%)    0.4% (  -8% -   10%) 0.800
                   OrNotHighHigh      200.33      (3.1%)      201.39      (4.6%)    0.5% (  -6% -    8%) 0.667
                      AndHighLow     1013.59      (2.4%)     1019.06      (2.4%)    0.5% (  -4% -    5%) 0.480
               HighTermTitleSort       72.60      (3.3%)       73.08      (3.5%)    0.7% (  -5% -    7%) 0.540
                         Prefix3      191.90      (5.3%)      193.31      (4.5%)    0.7% (  -8% -   11%) 0.635
                   OrHighNotHigh      198.74      (2.8%)      200.40      (4.1%)    0.8% (  -5% -    7%) 0.452
                          IntNRQ      125.30     (11.9%)      126.54      (9.9%)    1.0% ( -18% -   25%) 0.776
                    OrHighNotMed      281.15      (2.4%)      284.02      (5.0%)    1.0% (  -6% -    8%) 0.407
           HighTermDayOfYearSort      255.46      (3.2%)      258.19      (3.8%)    1.1% (  -5% -    8%) 0.333
                 CountOrHighHigh       90.76      (5.7%)       91.74      (5.1%)    1.1% (  -9% -   12%) 0.529
                    OrNotHighMed      326.08      (3.7%)      329.84      (4.8%)    1.2% (  -7% -   10%) 0.396
                       OrHighLow      786.85      (1.8%)      797.01      (2.7%)    1.3% (  -3% -    5%) 0.076
                    OrHighNotLow      368.40      (2.5%)      373.79      (5.3%)    1.5% (  -6% -    9%) 0.265
            HighTermTitleBDVSort       57.32      (4.3%)       58.16      (4.0%)    1.5% (  -6% -   10%) 0.268
                    OrNotHighLow     1096.17      (3.0%)     1114.86      (3.0%)    1.7% (  -4% -    7%) 0.071
                        HighTerm      338.49      (2.5%)      347.01      (2.5%)    2.5% (  -2% -    7%) 0.001
                         LowTerm      663.36      (3.3%)      682.20      (3.4%)    2.8% (  -3% -    9%) 0.008
                      OrHighRare      246.60      (6.2%)      253.61      (7.7%)    2.8% ( -10% -   17%) 0.198
                         MedTerm      433.32      (3.0%)      445.99      (3.1%)    2.9% (  -3% -    9%) 0.003
                      TermDTSort      196.52      (3.4%)      219.52      (4.9%)   11.7% (   3% -   20%) 0.000
               HighTermMonthSort     1082.56      (2.7%)     1404.25      (3.4%)   29.7% (  22% -   36%) 0.000

Queries sorted by field seem to benefit from the lesser synchronization across search threads.

gsmiller

Looks like a good change to me (other than needing a CHANGES entry?). Nice simplification and looks like a positive performance outcome.

…docs_collector

`TopScoreDocCollectorManager` has a dependency on `HitsThresholdChecker`, which is essentially a shared counter that is incremented until it reaches the total hits threshold, when the scorer can start dynamically pruning hits. A consequence of this removal is that dynamic pruning may start later, as soon as: - either the current slice collected `totalHitsThreshold` hits, - or another slice collected `totalHitsThreshold` hits and the current slice collected enough hits (up to 1,024) to check the shared `MaxScoreAccumulator`. So in short, it exchanges a bit more work globally in favor of a bit less contention. A longer-term goal of mine is to stop specializing our `CollectorManager`s based on whether they are going to be used concurrently or not.

jpountz · 2024-10-29T18:46:50Z

Wow, it's been a bigger speedup on nightly benchmarks than on my machine: https://benchmarks.mikemccandless.com/And3Terms.html.

Our collector managers have a `supportsConcurrency` flag to optimize the case when they are used in a single thread. This PR proposes to remove this flag now that the optimization doesn't do much as a result of apache#13943.

…13977) Our collector managers have a `supportsConcurrency` flag to optimize the case when they are used in a single thread. This PR proposes to remove this flag now that the optimization doesn't do much as a result of #13943.

…3943.

jpountz added this to the 10.1.0 milestone Oct 22, 2024

Likewise for TopFieldCollector.

495eca4

jpountz changed the title ~~Remove TopScoreDocCollector's dependency on HitsThresholdChecker.~~ Remove HitsThresholdChecker. Oct 23, 2024

fix comment

888f10a

gsmiller approved these changes Oct 24, 2024

View reviewed changes

jpountz added 5 commits October 24, 2024 21:48

Merge branch 'main' into remove_hit_threshold_checker_from_top_score_…

83f3f78

…docs_collector

CHANGES

da33c1a

Merge branch 'main' into remove_hit_threshold_checker_from_top_score_…

cb8acba

…docs_collector

tidy

d866877

iter

d5bb994

jpountz merged commit 937432a into apache:main Oct 28, 2024
3 checks passed

jpountz deleted the remove_hit_threshold_checker_from_top_score_docs_collector branch October 28, 2024 14:50

jpountz mentioned this pull request Nov 4, 2024

Remove supportsConcurrency flag from top-docs collector managers. #13977

Merged

jpountz added a commit that referenced this pull request Nov 5, 2024

Adjust expectations of TestTopFieldCollectorEarlyTermination after #1…

9a14d57

…3943.

jpountz added a commit that referenced this pull request Nov 5, 2024

Adjust expectations of TestTopFieldCollectorEarlyTermination after #1…

539cf3c

…3943.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove HitsThresholdChecker. #13943

Remove HitsThresholdChecker. #13943

jpountz commented Oct 22, 2024 •

edited

Loading

jpountz commented Oct 22, 2024 •

edited

Loading

jpountz commented Oct 23, 2024

gsmiller left a comment

jpountz commented Oct 29, 2024

Remove HitsThresholdChecker. #13943

Remove HitsThresholdChecker. #13943

Conversation

jpountz commented Oct 22, 2024 • edited Loading

jpountz commented Oct 22, 2024 • edited Loading

jpountz commented Oct 23, 2024

gsmiller left a comment

Choose a reason for hiding this comment

jpountz commented Oct 29, 2024

jpountz commented Oct 22, 2024 •

edited

Loading

jpountz commented Oct 22, 2024 •

edited

Loading