-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestBPReorderingMergePolicy fails CheckIndex.testHnswGraph #14127
Comments
Disabling
|
In case useful, I could reduce the test to something like:
This test fails ~5% of the times for me locally. |
I spent some time trying to understand how this arose, and working on a fix, and I believe that the BP reordering exposed a pre-existing behavior in the component-merging code which could create these duplicates. Nothing really prevents it, but it didn't happen before, I'm not completely sure why, but my best theory is that because the way graphs are created we always add docs in docid order, but this is not true when reordering. I looked in to how to prevent the duplicates, and one thing we could do is to remove them when writing the graph in the codec (in |
Thanks for the test case @iverase, but I was able to reliably repro using the existing test case. These tests are pretty slow so I'd just as soon not add too many more, unless you think there is some new thing tested by that? |
I just wanted to prove that the issue is not related with the new stuff you added and can be reproduced with a very simple test. No need to commit it. It helped me to reproduce it with a lower number of nodes (I could reproduce it with just 96 documents) so I could study it better. and I had another look and these are my observations: In this line we create a list of components. If I understand correctly a component contains a list of nodes that are connected between them and it provides and entry node and the number of nodes it contains. I assume that two components are disconnected. A bit below we tried to connect two components by making a search of the closest nodes from one component (c0) to the start of the other component (c1). In theory that search should return two nodes from c0 that do not exist in c1. When it fails, it easy to prove that this search is returning neighbours from c1 when it shouldn't. Is that behaviour right? |
The thing that complicates this is that the graph is directed - that is its links are not reciprocal always (although they mostly are), Therefore although it is true by construction of the components that a search of C0 will not find nodes in C1, the reverse is not necessarily true. Nodes in C0 may be reachable from nodes in C1. |
One possible fix would be to remove neighbours from c1 from the search result which implies an extra graph seek to collect the neighbours into the visited nodes. |
Description
B1FA21A9AAC314F2
Gradle command to reproduce
./gradlew :lucene:misc:test --tests "org.apache.lucene.misc.index.TestBPReorderingMergePolicy" -Dtests.seed=B1FA21A9AAC314F2
The text was updated successfully, but these errors were encountered: