-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide GPU-accelerated vector indexes with RAFT #413
base: main
Are you sure you want to change the base?
Conversation
@wphicks also, probably need to sign the CLA as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work so far
Please see comments
CMakeLists.txt
Outdated
# Only do these if this is the main project, and not if it is included through add_subdirectory | ||
set_property(GLOBAL PROPERTY USE_FOLDERS ON) | ||
|
||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fexceptions -fPIC ${CLANG_SAN_FLAGS} ${LLVM_CXX_FLAGS} ${COV_CXX_FLAGS}") | ||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fexceptions -fPIC -pthread ${CLANG_SAN_FLAGS} ${LLVM_CXX_FLAGS} ${COV_CXX_FLAGS}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need pthread here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually meant to ask you about that. I was having trouble compiling without it (even before making any of my other changes), but I'll have to recreate the failing build log. Will either post it here or as a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please do, thanks!
src/VecSim/vec_sim_common.h
Outdated
double preferredShmemCarveout; // Fraction of GPU's unified memory / L1 | ||
// cache to be used as shared memory | ||
|
||
} IVFParams; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we RAFTIVFParams
this?
src/VecSim/vec_sim_common.h
Outdated
|
||
} IVFParams; | ||
|
||
typedef struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe we need this struct definition as is
you can send the IVFParams
as primaryIndexParams
when creating TieredIndexParams
if you need something specific for you tiered index management layer, this is the case where you need to create a specific struct
see
union {
TieredHNSWParams tieredHnswParams;
} specificParams;
at TieredIndexParams
@@ -35,6 +35,11 @@ class BruteForceIndex : public VecSimIndexAbstract<DistType> { | |||
public: | |||
BruteForceIndex(const BFParams *params, const AbstractIndexInitParams &abstractInitParams); | |||
|
|||
void clear() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when do we need to use this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clear()
function is used in the Tiered Raft IVF index: The vectors from the flat buffers are all transferred to the backend index, and then the flat buffer is cleared.
src/VecSim/algorithms/ivf/ivf.cuh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to raft_ivf please
src/VecSim/algorithms/ivf/ivf.cuh
Outdated
// Copy label data to previously allocated device buffer | ||
raft::copy(label_gpu.data_handle(), label, batch_size, res_.get_stream()); | ||
|
||
if (std::holds_alternative<raft::neighbors::ivf_flat::index_params>(build_params_)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to avoid this if
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not unless we want to split IVF flat and IVF PQ back out into separate indexes again.
One possible alternate spelling would be to use std::visit
with a constexpr if to distinguish the types. This has the disadvantage that compilers sometimes have trouble creating an efficient std::visit
over small variants, but it has the advantage of providing obviously idiomatic access to the variant.
If performance is what we're worried about, I would avoid std::visit
but add a compile-time assumption (via __builtin_unreachable
, __assume
or similar based on the compiler) that the variant contains the expected type prior to the std::get
call. That usually allows the compiler to avoid the check that it must ordinarily make to see if it needs to throw a std::bad_variant_access
.
If readability is what we are concerned about, the bodies of those if clauses could be separated into their own functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please share a suggestion for the last proposal, so we can iterate? (could be here or on slack). I'm thinking about modularity and code reusability here.
src/VecSim/algorithms/ivf/ivf.cuh
Outdated
index_{std::nullopt} {} | ||
auto addVector(const void *vector_data, labelType label, | ||
bool overwrite_allowed = true) override { | ||
return addVectorBatch(vector_data, &label, 1, overwrite_allowed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mandatory due to our requirements? I guess batch size 1 is not efficient here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. As a matter of fact, I would recommend that this index type only be used in a tiered index to allow for efficient insertions.
src/VecSim/algorithms/ivf/ivf.cuh
Outdated
res_.get_stream()); | ||
|
||
// Perform correct search based on index type | ||
if (std::holds_alternative<raft::neighbors::ivf_flat::index>(index_)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question about this if
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
auto indexSize() { | ||
auto frontend_lock = std::scoped_lock(this->flatIndexGuard); | ||
auto backend_lock = std::scoped_lock(this->mainIndexGuard); | ||
return (getBackendIndex().indexSize() + this->frontendIndex.indexSize()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming transfer is "atomic" and does not cause duplications?
|
||
auto backend_lock = std::scoped_lock(this->mainIndexGuard); | ||
this->flatBuffer->clear(); | ||
frontend_lock.unlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be called after addVectorBatch
otherwise we can have both indexes empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the fact that we have a lock on the backend index mean that queries against that index will not execute until we have completed the transfer? Since we do not clear the frontend index before acquiring the backend lock, and we unlock the frontend after the index has been cleared, my understanding was that this should be safe. Are queries allowed to be made against either the frontend or backend index without acquiring the corresponding lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right, I missed the backend lock,
Additional question regarding this:
when calling
void clear() {
idToLabelMapping.clear();
vectorBlocks.clear();
count = idType{};
}
you are not clearing the label set of the frontend index. Can this cause issues when calling multiple times to transferToBackend
with an increasing set of IDs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small questions/comments after doing a high-level review - before we move on to POC
this->idToLabelMapping.shrink_to_fit(); | ||
this->vectorBlocks.clear(); | ||
this->vectorBlocks.shrink_to_fit(); | ||
this->count = idType{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this->count = 0
?
this->idToLabelMapping.shrink_to_fit(); | ||
this->vectorBlocks.clear(); | ||
this->vectorBlocks.shrink_to_fit(); | ||
this->count = idType{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this->count = 0
?
//.multi = raftIvfParams->multi, | ||
//.logCtx = params->logCtx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why commented out?
|
||
double getDistanceFrom_Unsafe(labelType label, const void *blob) const override { | ||
auto flat_dist = this->frontendIndex->getDistanceFrom_Unsafe(label, blob); | ||
auto raft_dist = this->backendIndex->getDistanceFrom_Unsafe(label, blob); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not implemented in the backend index...
|
||
// delete in place. TODO: Add async job for this | ||
this->mainIndexGuard.lock(); | ||
num_deleted_vectors += this->backendIndex->deleteVector(label); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we reclaim memory after deletion? As far as I understand in the backend index, we only mark vectors as deleted
Co-authored-by: GuyAv46 <[email protected]>
This PR is a replacement for #377, which fell out of date with changes on main. This PR provides access to GPU-accelerated index types provided by the RAPIDS RAFT library. It introduces two new index types: IVF and Tiered IVF, which provide access to both IVF Flat and IVFPQ indexes depending on how they are configured.
The intention of this initial integration is to provide GPU-based performance improvements for building indexes and batched searches. A later PR will introduce another index type which will offer perf improvement (relative to CPU HNSW) for single-query searches as well.
Main objects this PR modified
This PR does not substantively modify existing objects. A
clear
method is added to the brute force index to allow it to be reset with a single call.This PR is still a work in progress. Remaining tasks:
Tasks to be tackled in later PRs:
Questions for this PR:
Mark if applicable