-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Support efficient filtering rules for switching to exact KNN with nmslib #1340
Comments
@tstadel Thanks for the feature suggestion. If you use script score with filter, you might be able to achieve what you want? |
Well script score would certainly work for cases where I know I need exact KNN. But that's the point, it depends on the query and filters. And often I don't know how selective my filters are.I believe the logic making this decision implemented for FAISS already brings a lot of value without having complete efficient filtering: I just wouldn't need to make the decision about which approach to use, instead I get the best fitting approach automatically. |
Your approach will lead to inconsistent user experience in nmslib with filter. For example, if the query runs on exact search, it will return results. However, if the number of documents get increased and if the same query switch to ANN with post filtering, it might return empty result all of sudden. @tstadel By the way, is there any reason why you want to use nmslib instead of lucene or faiss? |
@heemin32 thanks for your response. Wouldn't the last rule The reason why I'm asking for nmslib is that it's not that easy to simply switch the lib in production settings. This would at least mean reindexing and some additional benchmark / scaling tests. I'm just guessing that most users are on nmslib as it was the first and is still the default lib used for ANN in OpenSearch. Additionally I can recall testing FAISS against nmslib in one of these production settings and nmslib was way faster (more than 50% latency increase, but it was on 1.3 however). So adding this functionality would have a great impact while keeping the risk rather low: users would need to opt for efficient filtering syntax explicitly and additionally can control the behavior to some degrees using the index settings such as |
The inconsistency is more about between pre-filtering and post-filtering. Even with the adjustment in threshold, there are high chance of having a result of less than k because of the nature of post-filtering. Understand your need of the requirements but still I am not convinced to combining pre-filtering and post-filtering which is different behavior with other engines. |
I think you should redo the benchmarks. We have improved faiss pure HNSW search a lot specially from 2.9. If you look at the latest benchmarks here: #1131 you can see the diff between Faiss and NMSLib has been reduced significantly. The core problem with supporting filtering with Nmslib is that, the core library doesn't support filters. There is another lib https://github.com/nmslib/hnswlib, which is more of successor and lightweight than nmslib and support filtering too. But problem is nmslib and hnswlib are not compatible with one another. |
@navneet1v thanks for pointing to the benchmarks. Looks pretty nice and I agree, it feels like it's time to rerun our benchmarks indeed. So full efficient filtering support with nmslib seems unrealistic as the core library most likely won't ever support it. Is there a roadmap regarding nmslib/hnswlib support in OpenSearch? Would you recommend switching to FAISS or Lucene in general? @heemin32 thanks for your comment and I agree with the high chance of having less than k results after postfiltering if we used the unchanged k during the ANN search. A simple mutliplier factor could mitigate this: e.g. if there are filters, search with 5*k instead of k. Just speaking from my perspective, event though this adds a bit of additional complexity it seems more favorable than never supporting a bit of efficient filtering with nmslib. WDYT? |
@navneet1v Are there any plans for supporting hnswlib in OpenSearch? |
There is no plan to support hnswlib in OpenSearch as of now. I would recommend to switch to either FAISS or Lucene. |
Closing for now as we dont have plan to add filter to nmslib |
Is your feature request related to a problem?
Efficient filtering rules of the FAISS implementation for switching to exact KNN search are library-agnostic and improve search results independent of the native FAISS filtering capabilities (especially for highly selective filters). For nmslib these rules would be beneficial as well and make the overall behavior of KNN plugin's search-filter-logic more consistent.
What solution would you like?
Use the same logic as depicted on https://opensearch.org/docs/latest/images/faiss-algorithm.jpg for nmslib when using efficient filtering. The issue that nmslib does not support applying filters natively can be solved by applying simple postfiltering in
JNIService.queryIndex
:k-NN/src/main/java/org/opensearch/knn/jni/JNIService.java
Lines 105 to 107 in 2e3ab95
What alternatives have you considered?
Do you have any additional context?
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: