Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up Lucene912PostingsReader nextDoc() impls. #13963

Merged
merged 5 commits into from
Oct 29, 2024

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Oct 29, 2024

127 times out of 128, nextDoc() returns the next doc ID in the buffer. Currently, we check if the current doc is equal to the last doc ID in the block to know if we need to refill. We can do better by comparing the current index in the block with the block size, which is a bit more efficient since the latter is a constant.

127 times out of 128, nextDoc() returns the next doc ID in the buffer.
Currently, we check if the current doc is equal to the last doc ID in the block
to know if we need to refill. We can do better by comparing the current index
in the block with the block size, which is a bit more efficient since the
latter is a constant.
@jpountz jpountz added this to the 10.1.0 milestone Oct 29, 2024
@jpountz
Copy link
Contributor Author

jpountz commented Oct 29, 2024

Here are two runs on wikibigall. I wouldn't read too much in the CountOrHighHigh and CountOrHighMed tasks, it wouldn't be the first time that my machine reports bigger speedups than nightlies. I suspect that the small speedups to OrHighHigh and OrStopWords are real as well.

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                          IntNRQ      181.08      (4.7%)      175.71      (4.9%)   -3.0% ( -12% -    7%) 0.052
                         LowTerm     1039.66      (3.4%)     1018.83      (2.9%)   -2.0% (  -8% -    4%) 0.046
                       CountTerm     8927.92      (4.3%)     8800.67      (4.6%)   -1.4% (  -9% -    7%) 0.313
                         MedTerm      657.46      (3.4%)      650.79      (3.1%)   -1.0% (  -7% -    5%) 0.331
                      TermDTSort      365.86      (5.5%)      362.49      (6.6%)   -0.9% ( -12% -   11%) 0.630
                        Wildcard       50.98      (3.9%)       50.52      (3.1%)   -0.9% (  -7% -    6%) 0.419
                        HighTerm      501.50      (3.2%)      497.16      (3.1%)   -0.9% (  -6% -    5%) 0.386
           HighTermDayOfYearSort      854.14      (3.5%)      846.81      (3.8%)   -0.9% (  -7% -    6%) 0.458
                        PKLookup      270.72      (1.8%)      269.12      (2.9%)   -0.6% (  -5% -    4%) 0.436
               HighTermMonthSort     3385.08      (2.8%)     3371.46      (1.9%)   -0.4% (  -4% -    4%) 0.596
                      AndHighLow      970.96      (2.3%)      967.94      (2.3%)   -0.3% (  -4% -    4%) 0.671
                    OrHighNotMed      423.25      (2.8%)      423.29      (3.7%)    0.0% (  -6% -    6%) 0.994
            HighTermTitleBDVSort       13.72      (1.7%)       13.74      (4.6%)    0.1% (  -6% -    6%) 0.921
                   OrNotHighHigh      244.21      (2.5%)      244.66      (3.7%)    0.2% (  -5% -    6%) 0.855
                          Fuzzy2       75.09      (3.1%)       75.24      (2.1%)    0.2% (  -4% -    5%) 0.816
                          Fuzzy1       79.86      (3.2%)       80.12      (2.5%)    0.3% (  -5% -    6%) 0.714
                    OrHighNotLow      469.97      (3.2%)      471.61      (4.0%)    0.3% (  -6% -    7%) 0.762
                       OrHighLow      803.57      (1.7%)      806.73      (1.3%)    0.4% (  -2% -    3%) 0.415
                    OrNotHighMed      403.92      (2.4%)      405.73      (3.4%)    0.4% (  -5% -    6%) 0.630
                   OrHighNotHigh      260.66      (2.1%)      261.96      (3.2%)    0.5% (  -4% -    5%) 0.558
                    OrNotHighLow     1021.86      (3.7%)     1027.02      (2.7%)    0.5% (  -5% -    7%) 0.620
                       And3Terms      174.78      (2.7%)      175.69      (2.4%)    0.5% (  -4% -    5%) 0.527
               HighTermTitleSort      148.23      (5.7%)      149.09      (3.6%)    0.6% (  -8% -   10%) 0.699
                      OrHighRare      283.87      (4.2%)      285.62      (5.0%)    0.6% (  -8% -   10%) 0.674
             And2Terms2StopWords      160.51      (2.1%)      161.52      (1.7%)    0.6% (  -3% -    4%) 0.309
                      AndHighMed      255.10      (1.4%)      257.42      (1.9%)    0.9% (  -2% -    4%) 0.089
                          OrMany       19.08      (3.0%)       19.27      (2.7%)    1.0% (  -4% -    6%) 0.263
                        Or3Terms      172.02      (4.1%)      174.12      (3.9%)    1.2% (  -6% -    9%) 0.335
                    AndStopWords       32.26      (4.0%)       32.68      (3.0%)    1.3% (  -5% -    8%) 0.240
              Or2Terms2StopWords      162.20      (3.9%)      164.52      (3.4%)    1.4% (  -5% -    9%) 0.217
                     AndHighHigh       79.61      (1.2%)       80.79      (1.8%)    1.5% (  -1% -    4%) 0.002
                 CountAndHighMed      149.86      (3.0%)      152.14      (3.5%)    1.5% (  -4% -    8%) 0.142
                       OrHighMed      207.31      (3.5%)      210.50      (3.4%)    1.5% (  -5% -    8%) 0.162
                         Prefix3      102.45      (4.6%)      104.13      (3.2%)    1.6% (  -5% -    9%) 0.185
                CountAndHighHigh       49.77      (3.2%)       50.92      (3.4%)    2.3% (  -4% -    9%) 0.028
                     OrStopWords       35.35      (6.3%)       36.40      (5.6%)    3.0% (  -8% -   15%) 0.116
                      OrHighHigh       74.07      (4.5%)       76.33      (4.0%)    3.0% (  -5% -   12%) 0.022
                  CountOrHighMed      104.86      (1.6%)      142.47      (2.6%)   35.9% (  31% -   40%) 0.000
                 CountOrHighHigh       50.21      (2.2%)       76.43      (1.3%)   52.2% (  47% -   56%) 0.000

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
               HighTermMonthSort     3234.28      (2.8%)     3188.91      (2.6%)   -1.4% (  -6% -    4%) 0.097
                      AndHighLow      902.49      (2.6%)      892.42      (2.8%)   -1.1% (  -6% -    4%) 0.194
                         LowTerm     1146.08      (2.3%)     1137.36      (2.5%)   -0.8% (  -5% -    4%) 0.317
                       OrHighLow      822.84      (1.7%)      818.48      (1.5%)   -0.5% (  -3% -    2%) 0.290
            HighTermTitleBDVSort       17.45      (2.4%)       17.39      (2.9%)   -0.4% (  -5% -    5%) 0.675
                        Wildcard       46.65      (4.9%)       46.50      (3.7%)   -0.3% (  -8% -    8%) 0.815
                          Fuzzy1       79.54      (1.5%)       79.32      (1.2%)   -0.3% (  -2% -    2%) 0.519
                      AndHighMed      200.10      (1.6%)      199.63      (1.8%)   -0.2% (  -3% -    3%) 0.664
                         Prefix3      162.62      (4.7%)      162.31      (5.2%)   -0.2% (  -9% -   10%) 0.903
                         MedTerm      783.45      (2.3%)      782.38      (2.2%)   -0.1% (  -4% -    4%) 0.849
                       And3Terms      175.18      (2.0%)      175.03      (2.5%)   -0.1% (  -4% -    4%) 0.903
                        PKLookup      278.19      (2.6%)      278.16      (2.6%)   -0.0% (  -5% -    5%) 0.991
             And2Terms2StopWords      160.69      (1.6%)      160.72      (2.4%)    0.0% (  -3% -    4%) 0.976
                    OrNotHighLow      988.77      (3.3%)      989.30      (2.5%)    0.1% (  -5% -    6%) 0.954
                     AndHighHigh       58.54      (1.3%)       58.62      (1.5%)    0.1% (  -2% -    2%) 0.768
                          OrMany       19.29      (1.9%)       19.32      (2.7%)    0.2% (  -4% -    4%) 0.828
                          Fuzzy2       74.41      (1.2%)       74.57      (1.1%)    0.2% (  -2% -    2%) 0.560
               HighTermTitleSort      160.12      (2.1%)      160.49      (2.0%)    0.2% (  -3% -    4%) 0.725
                       OrHighMed      203.85      (2.4%)      204.57      (2.5%)    0.4% (  -4% -    5%) 0.642
                        HighTerm      554.95      (2.4%)      557.32      (2.1%)    0.4% (  -4% -    5%) 0.557
                       CountTerm     8520.41      (5.7%)     8560.00      (3.9%)    0.5% (  -8% -   10%) 0.764
                    AndStopWords       32.30      (2.9%)       32.47      (3.3%)    0.5% (  -5% -    6%) 0.599
                    OrNotHighMed      344.13      (3.5%)      346.17      (2.8%)    0.6% (  -5% -    7%) 0.554
           HighTermDayOfYearSort      828.58      (3.2%)      833.97      (2.3%)    0.6% (  -4% -    6%) 0.458
              Or2Terms2StopWords      162.90      (2.9%)      164.14      (3.0%)    0.8% (  -4% -    6%) 0.411
                        Or3Terms      174.14      (3.1%)      175.52      (3.3%)    0.8% (  -5% -    7%) 0.436
                   OrHighNotHigh      292.03      (2.9%)      294.76      (2.3%)    0.9% (  -4% -    6%) 0.263
                    OrHighNotLow      474.44      (3.2%)      478.90      (3.1%)    0.9% (  -5% -    7%) 0.352
                   OrNotHighHigh      239.75      (3.0%)      242.32      (2.4%)    1.1% (  -4% -    6%) 0.217
                 CountAndHighMed      150.77      (2.8%)      152.66      (3.7%)    1.3% (  -5% -    7%) 0.224
                    OrHighNotMed      383.40      (2.8%)      388.38      (2.7%)    1.3% (  -4% -    7%) 0.138
                CountAndHighHigh       49.91      (2.8%)       50.56      (3.5%)    1.3% (  -4% -    7%) 0.191
                      TermDTSort      363.59      (6.5%)      368.80      (6.1%)    1.4% ( -10% -   14%) 0.471
                      OrHighHigh       76.48      (3.3%)       77.81      (3.0%)    1.7% (  -4% -    8%) 0.080
                          IntNRQ      105.90      (9.2%)      107.78     (14.2%)    1.8% ( -19% -   27%) 0.640
                     OrStopWords       35.64      (4.8%)       36.35      (5.0%)    2.0% (  -7% -   12%) 0.198
                      OrHighRare      273.36      (4.7%)      280.73      (4.8%)    2.7% (  -6% -   12%) 0.074
                  CountOrHighMed      107.04      (2.2%)      142.92      (2.4%)   33.5% (  28% -   38%) 0.000
                 CountOrHighHigh       51.31      (2.3%)       76.15      (1.7%)   48.4% (  43% -   53%) 0.000

@jpountz jpountz changed the title Speedup Lucene912PostingsReader nextDoc() impls. Speed up Lucene912PostingsReader nextDoc() impls. Oct 29, 2024
@jpountz jpountz merged commit 9359cfd into apache:main Oct 29, 2024
3 checks passed
@jpountz jpountz deleted the speedup_nextDoc branch October 29, 2024 17:16
jpountz added a commit that referenced this pull request Oct 31, 2024
127 times out of 128, nextDoc() returns the next doc ID in the buffer.
Currently, we check if the current doc is equal to the last doc ID in the block
to know if we need to refill. We can do better by comparing the current index
in the block with the block size, which is a bit more efficient since the
latter is a constant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant