Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support efficient filtering rules for switching to exact KNN with nmslib #1340

Closed
tstadel opened this issue Dec 11, 2023 · 10 comments

Comments

@tstadel
Copy link

tstadel commented Dec 11, 2023

Is your feature request related to a problem?
Efficient filtering rules of the FAISS implementation for switching to exact KNN search are library-agnostic and improve search results independent of the native FAISS filtering capabilities (especially for highly selective filters). For nmslib these rules would be beneficial as well and make the overall behavior of KNN plugin's search-filter-logic more consistent.

What solution would you like?
Use the same logic as depicted on https://opensearch.org/docs/latest/images/faiss-algorithm.jpg for nmslib when using efficient filtering. The issue that nmslib does not support applying filters natively can be solved by applying simple postfiltering in JNIService.queryIndex:

if (KNNEngine.NMSLIB.getName().equals(engineName)) {
return NmslibService.queryIndex(indexPointer, queryVector, k);
}

What alternatives have you considered?

  • Postfiltering could be implemented in nmslib's JNI-layer, but I don't see too many benefits of this as we would need another method/interface here
  • Applying the rules to any ANN queries with filters (not just with efficient filtering syntax): I guess that's harder to implement as the usual postfiltering happens in a later stage of the search (where we do not have access to KNNWeight?). Probably we don't want that for lucene ANN queries, so at least we would need some param to switch it off, which makes it way more complex.

Do you have any additional context?
Add any other context or screenshots about the feature request here.

@heemin32
Copy link
Collaborator

@tstadel Thanks for the feature suggestion.
I am against partially applying efficient filtering in nmslib only for exact search case and using post filtering for ANN search. I would rather want to have complete efficient filtering in nmslib before applying the same rules for exact search.

If you use script score with filter, you might be able to achieve what you want?

@tstadel
Copy link
Author

tstadel commented Dec 11, 2023

@tstadel Thanks for the feature suggestion.
I am against partially applying efficient filtering in nmslib only for exact search case and using post filtering for ANN search. I would rather want to have complete efficient filtering in nmslib before applying the same rules for exact search.

If you use script score with filter, you might be able to achieve what you want?

Well script score would certainly work for cases where I know I need exact KNN. But that's the point, it depends on the query and filters. And often I don't know how selective my filters are.I believe the logic making this decision implemented for FAISS already brings a lot of value without having complete efficient filtering: I just wouldn't need to make the decision about which approach to use, instead I get the best fitting approach automatically.

@heemin32
Copy link
Collaborator

heemin32 commented Dec 11, 2023

Your approach will lead to inconsistent user experience in nmslib with filter. For example, if the query runs on exact search, it will return results. However, if the number of documents get increased and if the same query switch to ANN with post filtering, it might return empty result all of sudden.

@tstadel By the way, is there any reason why you want to use nmslib instead of lucene or faiss?

@tstadel
Copy link
Author

tstadel commented Dec 12, 2023

@heemin32 thanks for your response. Wouldn't the last rule R < K and P >= K avoid such dramatic inconsistencies?

The reason why I'm asking for nmslib is that it's not that easy to simply switch the lib in production settings. This would at least mean reindexing and some additional benchmark / scaling tests. I'm just guessing that most users are on nmslib as it was the first and is still the default lib used for ANN in OpenSearch. Additionally I can recall testing FAISS against nmslib in one of these production settings and nmslib was way faster (more than 50% latency increase, but it was on 1.3 however). So adding this functionality would have a great impact while keeping the risk rather low: users would need to opt for efficient filtering syntax explicitly and additionally can control the behavior to some degrees using the index settings such as knn.advanced.filtered_exact_search_threshold.

@heemin32
Copy link
Collaborator

The inconsistency is more about between pre-filtering and post-filtering. Even with the adjustment in threshold, there are high chance of having a result of less than k because of the nature of post-filtering.

Understand your need of the requirements but still I am not convinced to combining pre-filtering and post-filtering which is different behavior with other engines.

@navneet1v
Copy link
Collaborator

@tstadel

Additionally I can recall testing FAISS against nmslib in one of these production settings and nmslib was way faster (more than 50% latency increase, but it was on 1.3 however)

I think you should redo the benchmarks. We have improved faiss pure HNSW search a lot specially from 2.9. If you look at the latest benchmarks here: #1131 you can see the diff between Faiss and NMSLib has been reduced significantly.

The core problem with supporting filtering with Nmslib is that, the core library doesn't support filters. There is another lib https://github.com/nmslib/hnswlib, which is more of successor and lightweight than nmslib and support filtering too. But problem is nmslib and hnswlib are not compatible with one another.

@tstadel
Copy link
Author

tstadel commented Dec 14, 2023

@navneet1v thanks for pointing to the benchmarks. Looks pretty nice and I agree, it feels like it's time to rerun our benchmarks indeed. So full efficient filtering support with nmslib seems unrealistic as the core library most likely won't ever support it. Is there a roadmap regarding nmslib/hnswlib support in OpenSearch? Would you recommend switching to FAISS or Lucene in general?

@heemin32 thanks for your comment and I agree with the high chance of having less than k results after postfiltering if we used the unchanged k during the ANN search. A simple mutliplier factor could mitigate this: e.g. if there are filters, search with 5*k instead of k. Just speaking from my perspective, event though this adds a bit of additional complexity it seems more favorable than never supporting a bit of efficient filtering with nmslib. WDYT?

@SeyedAlirezaFatemi
Copy link

@navneet1v Are there any plans for supporting hnswlib in OpenSearch?

@heemin32
Copy link
Collaborator

heemin32 commented Jan 2, 2024

There is no plan to support hnswlib in OpenSearch as of now. I would recommend to switch to either FAISS or Lucene.

@jmazanec15
Copy link
Member

Closing for now as we dont have plan to add filter to nmslib

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants