-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HNSW BP reordering #14097
HNSW BP reordering #14097
Conversation
note: addresses issue #13565 |
One interesting detail that came out while testing with the merge policy is that indexing actually becomes faster with this change since the additional work we do in the BP reordering is outweighed by the savings we have performing HNSW searches during merging. Indeed the savings can be dramatic:
|
These results look amazing @msokolov! |
Right, I;m excited about this! And pleased how it worked out as a merge policy. I was kind of hoping @jpountz would review, but perhaps he's out for vacation. Most of this has already been seen and he approved the earlier PR. The main new thing here that might be controversial is the refactoring so that this reorderer shares a common interface (and some implementation, for less code repetition) with the pre-existing BPIndexReorderer. I think I'll go ahead and commit, but not backport yet, so we can get some runs in the test machines, and if we later decide we want to revise that approach we still can. Hmm I see it's only been a few days, including the weekend. I'll wait another day before pushing |
@msokolov do you have changes to luceneutil's The nightly benchy already indexes KNN vectors (with and without quantization), so that's another place where we could turn on this new feature... I'll open a luceneutil issue to iterate! Exciting... |
These are exciting numbers! Its interesting how improved search latency is dropping the index build time. Do we know why the searching times are so much better? Is it simply because during merge, the merger sees the docs in a specific order that aids better graph building? I am a bit confused by how this all works. |
I am not sure, but surmising that search performance is improved because of some combination of (1) graph ordinal decoding being faster (since we encode using VInts and these are now smaller), and (2) improved memory locality something something - I mean we are in theory comparing vectors that are nearer to each other in virtual memory space - not sure how that could/would affect anything though. I admit this is pretty hand-wavy, but it's the best I've got -- the only change here is that more similar vectors are now nearer to each other in docid (and thus vector ordinal) space, and a follow-on from that is that delta encoding in the HNSW graph is more compact/efficient. |
Regarding merging luceneutil tooling - I will open a PR, but suggest we hold off merging until this change hits Lucene |
I will note that tests with smaller indexes don't show such dramatic improvements - more support for the theory that graph decoding is what is helped, because there are no real compression savings on small graphs where all the deltas are 2-bytes already. In the previous round of this Adrien had suggested looking into different encodings for the graph node ids, as we have for postings - I suspect there is room for improvement there |
I like the more efficient delta encoding theory. Decoding |
If we really think My thought around the bp reordering helping was dependent on the order of the vectors read is actually helping the vector indexing because clusters of vectors are indexed together. Honestly, bootstrapping HNSW via clusters is a really nice idea (and could help solve the "merge cost problem" by reusing clusters during merge...). The numbers here are really nice. I just want to understand why they were better, especially as recall changes, which seems to indicate that the graph building itself is being changed. |
I didnt' see a big impact on recall beyond what is typical from noise -- even with the same graph settings we see variance in recall due to randomness in the graph creation when there are multiple threads. I had 'numMergeWorker': (12,), |
Yeah, that could explain variance. For repeatable tests, I generally only merge with one thread, but this is because I am usually doing things that effect recall. But that force merge time difference is 🤯 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty cool! I left a few questions, nothing major.
KnnVectorsReader currKnnVectorsReader = reader; | ||
if (reader instanceof PerFieldKnnVectorsFormat.FieldsReader candidateReader) { | ||
currKnnVectorsReader = candidateReader.getFieldReader(fieldInfo.name); | ||
if (!noDeletes(liveDocs) || !(reader instanceof HnswGraphProvider)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a couple seconds to make sure I got the double negation of !noDeletes
right. I wonder if this could be turned into a single negation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hah, true. Not sure where this came from - will change to hasDeletes
. Q: is it truly necessary to check every bit of liveDocs? If we have a non-null liveDocs doesn't that imply one of the bits will be zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: is it truly necessary to check every bit of liveDocs? If we have a non-null liveDocs doesn't that imply one of the bits will be zero?
I'd say yes. IMO null
is only an optimization for the case when all docs are live.
/** | ||
* Implementation of "recursive graph bisection", also called "bipartite graph partitioning" and | ||
* often abbreviated BP, an approach to doc ID assignment that aims at reducing the sum of the log | ||
* gap between consecutive neighbor node ids. See BPIndexReorderer in misc module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be able to link to this class since both are in the misc module now?
vectorScalarMul( | ||
1 / (float) Math.sqrt(VectorUtil.dotProduct(centroid, centroid)), centroid); | ||
} | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like it could be removed?
|
||
// Computing biases is typically a bottleneck, because each iteration needs to iterate over | ||
// all postings to recompute biases, and the total number of postings is usually one order of | ||
// magnitude or more than the number of docs. So we try to parallelize it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is probably not true for vectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but something analogous is true (cost is like N(docs) * D(dimension) -- I'll update the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had overlooked the number of dimensions, good point.
float gain = maxLeftBias - minRightBias; | ||
// This compares the gain of swapping the doc from the left side that is most attracted to the | ||
// right and the doc from the right side that is most attracted to the left against the | ||
// average vector length (/500) rather than zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this 500 come from, is it a heuristic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I'll update the comment
Thanks for the reviews. I agree the measurements are not well explained. I have other runs that show no or less impact on search times, unchanged index sizes, and sometimes negative impact (on indexing times) -- I believe we expect indexing times to increase due to the additional work done when merging. But I don't think any of that should block merging this since it doesn't impact any existing use case, and testing to get to the bottom of what's going on is laborious since it requires building large indexes, so it could be done over time |
Agreed, especially for the misc module. |
@msokolov I agree, we shouldn't block merging as it doesn't adjust default behavior. I was just surprised at the numbers and curious to the cause. |
Description
This is similar to the previous PR (#13683) on the same issue, but the difference here is:
BpVectorReorderer
compatible withBPIndexReorderer
so that either can plug intoBPReorderingMergePolicy