[FEATURE] Use IndexInput to load the graph files for Native Index #1951
Labels
enhancement
Features
Introduces a new unit of functionality that satisfies a requirement
indexing-improvements
This label should be attached to all the github issues which will help improving the indexing time.
Description
Currently K-NN plugin for native engines(Faiss and Nmslib) creates a separate graph file(in codec) to build and store the k-NN index at segment level. This file is tracked by Lucene for a segment but while reading the file k-NN plugin relies on FSDirectory to get the full path of the k-NN index at segment level and then use Native libs api to load the index in memory.
The above behavior causes few problems:
Solution
The solution I am proposing here is rather than relying on path of the file, k-NN plugin should use IndexInput to read the file. This new reading behavior also needs to be integrated with Faiss/Nmslib lib. In Faiss, I see they provide an interface IOReader which can be used to load the contents of the file. If k-NN plugin implements the interface and then underneath if it uses IndexInput to read the file this will avoid the problems mentioned above.
Some deep-dive I did suggest that IndexInput provides a way to read byte and Faiss just asks for
n
bytes anytime it wants to read anything.Ref: https://github.com/facebookresearch/faiss/blob/df0dea6c6d8951056763dc03528b3973c6ba26e2/faiss/impl/index_read.cpp#L531
Ref: https://github.com/facebookresearch/faiss/blob/c0052c15336a57f7068a7d098d5ce5b6234a2d70/faiss/impl/io_macros.h#L17-L28
Ref Lucene: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/store/DataInput.java#L58
I have not done any deep-dive on Nmslib.
Indexing
I also see that on indexing while writing the native index file we use the FSDirectory, if we do similar changes for writing the native index file, we can also remove the dependency of FSDirectory from write path too. Ref:
k-NN/src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesConsumer.java
Line 121 in ca5e483
The text was updated successfully, but these errors were encountered: