Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove HitsThresholdChecker. #13943

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Oct 22, 2024

Our top-docs collectors have a dependency on HitsThresholdChecker, which is essentially a shared counter that is incremented until it reaches the total hits threshold, when the scorer can start dynamically pruning hits.

A consequence of this removal is that dynamic pruning may start later, as soon as:

  • either the current slice collected totalHitsThreshold hits,
  • or another slice collected totalHitsThreshold hits and the current slice collected enough hits (up to 1,024) to check the shared MaxScoreAccumulator.

So in short, it exchanges a bit more work globally in favor of a bit less contention. A longer-term goal of mine is to stop specializing our CollectorManagers based on whether they are going to be used concurrently or not.

`TopScoreDocCollectorManager` has a dependency on `HitsThresholdChecker`, which
is essentially a shared counter that is incremented until it reaches the total
hits threshold, when the scorer can start dynamically pruning hits.

A consequence of this removal is that dynamic pruning may start later, as soon
as:
 - either the current slice collected `totalHitsThreshold` hits,
 - or another slice collected `totalHitsThreshold` hits and the current slice
   collected enough hits (up to 1,024) to check the shared
   `MaxScoreAccumulator`.

So in short, it exchanges a bit more work globally in favor of a bit less
contention. A longer-term goal of mine is to stop specializing our
`CollectorManager`s based on whether they are going to be used concurrently or
not.
@jpountz jpountz added this to the 10.1.0 milestone Oct 22, 2024
@jpountz
Copy link
Contributor Author

jpountz commented Oct 22, 2024

EDIT: at the time of this comment, only TopScoreDocCollector no longer used HitsThresholdChecker.

wikibigall with a searchConcurrency of 8 suggests that the slowdown is tiny:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      AndHighLow      849.71      (3.3%)      826.18      (2.2%)   -2.8% (  -7% -    2%) 0.002
           HighTermDayOfYearSort      253.62      (3.2%)      246.72      (3.0%)   -2.7% (  -8% -    3%) 0.005
                      TermDTSort      202.82      (3.4%)      198.21      (4.4%)   -2.3% (  -9% -    5%) 0.069
            HighTermTitleBDVSort       49.87      (5.1%)       49.01      (5.5%)   -1.7% ( -11% -    9%) 0.306
                      OrHighRare      245.78      (8.9%)      242.01      (9.2%)   -1.5% ( -17% -   18%) 0.591
             And2Terms2StopWords      184.35      (5.3%)      181.84      (4.5%)   -1.4% ( -10% -    8%) 0.379
                     AndHighHigh      102.63      (6.9%)      101.25      (5.9%)   -1.3% ( -13% -   12%) 0.507
                       CountTerm     8520.99      (3.9%)     8417.05      (5.1%)   -1.2% (  -9% -    8%) 0.396
                        Wildcard      112.90      (5.6%)      111.56      (4.8%)   -1.2% ( -10% -    9%) 0.471
                          Fuzzy1       79.74      (1.7%)       79.07      (1.7%)   -0.8% (  -4% -    2%) 0.114
                       OrHighLow      724.26      (2.3%)      719.70      (2.2%)   -0.6% (  -5% -    3%) 0.377
              Or2Terms2StopWords      192.42      (5.4%)      191.24      (4.0%)   -0.6% (  -9% -    9%) 0.680
                       And3Terms      176.74      (5.3%)      175.76      (4.1%)   -0.6% (  -9% -    9%) 0.712
                 CountOrHighHigh       88.76      (4.9%)       88.34      (4.4%)   -0.5% (  -9% -    9%) 0.744
               HighTermMonthSort     1066.76      (1.6%)     1062.79      (1.9%)   -0.4% (  -3% -    3%) 0.506
                          Fuzzy2       75.06      (1.4%)       74.85      (1.8%)   -0.3% (  -3% -    3%) 0.597
                  CountOrHighMed      137.67      (5.3%)      137.45      (4.4%)   -0.2% (  -9% -   10%) 0.920
                   OrHighNotHigh      196.77      (3.3%)      196.66      (3.5%)   -0.1% (  -6% -    6%) 0.959
               HighTermTitleSort       70.08      (6.6%)       70.09      (5.7%)    0.0% ( -11% -   13%) 0.994
                      OrHighHigh       94.07      (4.8%)       94.12      (5.1%)    0.0% (  -9% -   10%) 0.975
                      AndHighMed      182.18      (4.1%)      182.43      (3.4%)    0.1% (  -7% -    8%) 0.909
                    OrNotHighMed      255.11      (2.8%)      255.50      (3.3%)    0.2% (  -5% -    6%) 0.874
                       OrHighMed      242.11      (2.4%)      242.65      (2.5%)    0.2% (  -4% -    5%) 0.772
                   OrNotHighHigh      235.61      (2.3%)      236.26      (3.4%)    0.3% (  -5% -    6%) 0.766
                        HighTerm      361.55      (2.6%)      362.84      (2.7%)    0.4% (  -4% -    5%) 0.669
                         MedTerm      453.24      (2.8%)      455.07      (2.5%)    0.4% (  -4% -    5%) 0.628
                    OrHighNotMed      317.00      (3.0%)      318.40      (3.8%)    0.4% (  -6% -    7%) 0.680
                        PKLookup      277.48      (2.2%)      278.76      (2.7%)    0.5% (  -4% -    5%) 0.558
                          OrMany       46.17      (2.3%)       46.41      (2.9%)    0.5% (  -4% -    5%) 0.520
                         Prefix3       68.55      (4.0%)       69.01      (4.7%)    0.7% (  -7% -    9%) 0.627
                    OrHighNotLow      336.53      (3.2%)      339.73      (3.9%)    1.0% (  -5% -    8%) 0.395
                    AndStopWords       64.81      (5.3%)       65.46      (5.3%)    1.0% (  -9% -   12%) 0.543
                         LowTerm      640.08      (3.1%)      647.88      (2.6%)    1.2% (  -4% -    7%) 0.176
                CountAndHighHigh       74.37      (5.2%)       75.37      (5.6%)    1.3% (  -8% -   12%) 0.426
                 CountAndHighMed      161.25      (5.1%)      163.59      (5.7%)    1.4% (  -8% -   12%) 0.394
                    OrNotHighLow      865.87      (3.3%)      880.11      (2.7%)    1.6% (  -4% -    7%) 0.081
                        Or3Terms      175.34      (4.3%)      178.28      (4.9%)    1.7% (  -7% -   11%) 0.252
                     OrStopWords       69.26      (6.5%)       70.71      (6.4%)    2.1% ( -10% -   15%) 0.303
                          IntNRQ      166.81      (5.5%)      170.67     (10.5%)    2.3% ( -12% -   19%) 0.381

@jpountz
Copy link
Contributor Author

jpountz commented Oct 23, 2024

Actually, while I was at it, I also removed TopFieldCollector's dependency on HitsThresholdChecker, and then removed HitsThresholdChecker.

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
             And2Terms2StopWords      189.12      (4.5%)      182.81      (4.2%)   -3.3% ( -11% -    5%) 0.015
                    AndStopWords       67.12      (4.6%)       66.00      (3.8%)   -1.7% (  -9% -    7%) 0.214
                     OrStopWords       70.79      (6.9%)       69.97      (7.2%)   -1.1% ( -14% -   13%) 0.607
                       CountTerm     8165.89      (5.6%)     8090.18      (3.8%)   -0.9% (  -9% -    8%) 0.538
                        Wildcard      133.39      (4.1%)      132.24      (3.3%)   -0.9% (  -7% -    6%) 0.468
              Or2Terms2StopWords      194.66      (5.4%)      193.42      (4.1%)   -0.6% (  -9% -    9%) 0.673
                       OrHighMed      198.12      (4.1%)      197.08      (3.5%)   -0.5% (  -7% -    7%) 0.663
                       And3Terms      180.61      (4.0%)      179.74      (3.9%)   -0.5% (  -8% -    7%) 0.700
                 CountAndHighMed      163.62      (4.5%)      163.28      (3.5%)   -0.2% (  -7% -    8%) 0.872
                          Fuzzy1       78.71      (2.3%)       78.63      (1.8%)   -0.1% (  -4% -    4%) 0.881
                      AndHighMed      160.23      (4.9%)      160.16      (4.4%)   -0.0% (  -8% -    9%) 0.974
                      OrHighHigh      103.71      (4.1%)      103.68      (4.2%)   -0.0% (  -8% -    8%) 0.981
                          OrMany       46.85      (2.4%)       46.85      (2.7%)   -0.0% (  -5% -    5%) 0.996
                  CountOrHighMed      139.92      (4.2%)      139.96      (4.7%)    0.0% (  -8% -    9%) 0.984
                        PKLookup      278.34      (2.4%)      278.47      (2.5%)    0.0% (  -4% -    5%) 0.950
                        Or3Terms      176.21      (3.8%)      176.43      (2.9%)    0.1% (  -6% -    7%) 0.904
                          Fuzzy2       74.40      (2.5%)       74.65      (2.0%)    0.3% (  -4% -    4%) 0.647
                CountAndHighHigh       76.54      (4.8%)       76.82      (4.1%)    0.4% (  -8% -    9%) 0.798
                     AndHighHigh       96.45      (5.3%)       96.82      (4.3%)    0.4% (  -8% -   10%) 0.800
                   OrNotHighHigh      200.33      (3.1%)      201.39      (4.6%)    0.5% (  -6% -    8%) 0.667
                      AndHighLow     1013.59      (2.4%)     1019.06      (2.4%)    0.5% (  -4% -    5%) 0.480
               HighTermTitleSort       72.60      (3.3%)       73.08      (3.5%)    0.7% (  -5% -    7%) 0.540
                         Prefix3      191.90      (5.3%)      193.31      (4.5%)    0.7% (  -8% -   11%) 0.635
                   OrHighNotHigh      198.74      (2.8%)      200.40      (4.1%)    0.8% (  -5% -    7%) 0.452
                          IntNRQ      125.30     (11.9%)      126.54      (9.9%)    1.0% ( -18% -   25%) 0.776
                    OrHighNotMed      281.15      (2.4%)      284.02      (5.0%)    1.0% (  -6% -    8%) 0.407
           HighTermDayOfYearSort      255.46      (3.2%)      258.19      (3.8%)    1.1% (  -5% -    8%) 0.333
                 CountOrHighHigh       90.76      (5.7%)       91.74      (5.1%)    1.1% (  -9% -   12%) 0.529
                    OrNotHighMed      326.08      (3.7%)      329.84      (4.8%)    1.2% (  -7% -   10%) 0.396
                       OrHighLow      786.85      (1.8%)      797.01      (2.7%)    1.3% (  -3% -    5%) 0.076
                    OrHighNotLow      368.40      (2.5%)      373.79      (5.3%)    1.5% (  -6% -    9%) 0.265
            HighTermTitleBDVSort       57.32      (4.3%)       58.16      (4.0%)    1.5% (  -6% -   10%) 0.268
                    OrNotHighLow     1096.17      (3.0%)     1114.86      (3.0%)    1.7% (  -4% -    7%) 0.071
                        HighTerm      338.49      (2.5%)      347.01      (2.5%)    2.5% (  -2% -    7%) 0.001
                         LowTerm      663.36      (3.3%)      682.20      (3.4%)    2.8% (  -3% -    9%) 0.008
                      OrHighRare      246.60      (6.2%)      253.61      (7.7%)    2.8% ( -10% -   17%) 0.198
                         MedTerm      433.32      (3.0%)      445.99      (3.1%)    2.9% (  -3% -    9%) 0.003
                      TermDTSort      196.52      (3.4%)      219.52      (4.9%)   11.7% (   3% -   20%) 0.000
               HighTermMonthSort     1082.56      (2.7%)     1404.25      (3.4%)   29.7% (  22% -   36%) 0.000

Queries sorted by field seem to benefit from the lesser synchronization across search threads.

@jpountz jpountz changed the title Remove TopScoreDocCollector's dependency on HitsThresholdChecker. Remove HitsThresholdChecker. Oct 23, 2024
Copy link
Contributor

@gsmiller gsmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good change to me (other than needing a CHANGES entry?). Nice simplification and looks like a positive performance outcome.

@jpountz jpountz merged commit 937432a into apache:main Oct 28, 2024
3 checks passed
@jpountz jpountz deleted the remove_hit_threshold_checker_from_top_score_docs_collector branch October 28, 2024 14:50
jpountz added a commit that referenced this pull request Oct 28, 2024
`TopScoreDocCollectorManager` has a dependency on `HitsThresholdChecker`, which
is essentially a shared counter that is incremented until it reaches the total
hits threshold, when the scorer can start dynamically pruning hits.

A consequence of this removal is that dynamic pruning may start later, as soon
as:
 - either the current slice collected `totalHitsThreshold` hits,
 - or another slice collected `totalHitsThreshold` hits and the current slice
   collected enough hits (up to 1,024) to check the shared
   `MaxScoreAccumulator`.

So in short, it exchanges a bit more work globally in favor of a bit less
contention. A longer-term goal of mine is to stop specializing our
`CollectorManager`s based on whether they are going to be used concurrently or
not.
@jpountz
Copy link
Contributor Author

jpountz commented Oct 29, 2024

Wow, it's been a bigger speedup on nightly benchmarks than on my machine: https://benchmarks.mikemccandless.com/And3Terms.html.

jpountz added a commit to jpountz/lucene that referenced this pull request Nov 4, 2024
Our collector managers have a `supportsConcurrency` flag to optimize the case
when they are used in a single thread. This PR proposes to remove this flag now
that the optimization doesn't do much as a result of apache#13943.
jpountz added a commit that referenced this pull request Nov 5, 2024
…13977)

Our collector managers have a `supportsConcurrency` flag to optimize the case
when they are used in a single thread. This PR proposes to remove this flag now
that the optimization doesn't do much as a result of #13943.
jpountz added a commit that referenced this pull request Nov 5, 2024
…13977)

Our collector managers have a `supportsConcurrency` flag to optimize the case
when they are used in a single thread. This PR proposes to remove this flag now
that the optimization doesn't do much as a result of #13943.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants