Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow build graph greedily for quantization scenarios #2175

Conversation

VijayanB
Copy link
Member

@VijayanB VijayanB commented Oct 1, 2024

Description

Previosuly we only added support to build greedily for non quantization scenario. In this commit, we can remove that constraint, however, we cannot skip writing quantization state since it is required irrespective of type of search is executed later.

Related Issues

Part of #1942

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@VijayanB
Copy link
Member Author

VijayanB commented Oct 2, 2024

skip changelog. Will add it while merging to main.

Comment on lines +103 to +104
// Check only after quantization state writer finish writing its state, since it is required
// even if there are no graph files in segment, which will be later used by exact search
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think its worth updating component test here? It seems tricky enough to be captured in test. If integration tests are capturing it I am good with it as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added integration tests

@VijayanB VijayanB force-pushed the remove-quantization-constraint branch from 87d95b3 to 2677c6f Compare October 2, 2024 20:43
@VijayanB VijayanB requested a review from shatejas October 2, 2024 20:45
Previosuly we only added support to build greedily for
non quantization scenario. In this commit, we can remove
that constraint, however, we cannot skip writing quanitization
state since it is required irrespective of type of search
is executed later.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
@VijayanB VijayanB force-pushed the remove-quantization-constraint branch from 2677c6f to af96fed Compare October 2, 2024 20:50
@VijayanB VijayanB merged commit 7ca1c24 into opensearch-project:feature/build-vector-ds-greedily Oct 2, 2024
28 of 29 checks passed
@VijayanB VijayanB deleted the remove-quantization-constraint branch October 2, 2024 23:34
Copy link
Member

@vibrantvarun vibrantvarun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

VijayanB added a commit to VijayanB/k-NN-2 that referenced this pull request Oct 4, 2024
…ject#2175)

Previosuly we only added support to build greedily for
non quantization scenario. In this commit, we can remove
that constraint, however, we cannot skip writing quanitization
state since it is required irrespective of type of search
is executed later.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
VijayanB added a commit to VijayanB/k-NN-2 that referenced this pull request Oct 8, 2024
…ject#2175)

Previosuly we only added support to build greedily for
non quantization scenario. In this commit, we can remove
that constraint, however, we cannot skip writing quanitization
state since it is required irrespective of type of search
is executed later.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
VijayanB added a commit to VijayanB/k-NN-2 that referenced this pull request Oct 9, 2024
…ject#2175)

Previosuly we only added support to build greedily for
non quantization scenario. In this commit, we can remove
that constraint, however, we cannot skip writing quanitization
state since it is required irrespective of type of search
is executed later.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
VijayanB added a commit that referenced this pull request Oct 10, 2024
…t search when there are no engine files (#2188)

* Introduce new setting to configure when to build graph during segment creation (#2007)

Added new updatable index setting "build_vector_data_structure_threshold", which will be
considered when to build braph or not for native engines.
This is noop for lucene. This depends on use lucene format as prerequisite.
We don't need to add flag since it is only enable if lucene format is
already enabled.

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Add integration test for binary vector values (#2142)

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Allow build graph greedily for quantization scenarios (#2175)

Previosuly we only added support to build greedily for
non quantization scenario. In this commit, we can remove
that constraint, however, we cannot skip writing quanitization
state since it is required irrespective of type of search
is executed later.

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Add exact search if no native engine files are available (#2136)

* Add exact search if no engine files are in segments

When graph is not available, plugin will return empty results. With this change,
exact search will be performed when only no engine file is available in segment.
We also don't need version check or feature flag because, option to not build vector
data structure will only be available post 2.17.
If an index is created using pre 2.17 version, segment will always have engine files
and this feature will never be called during search.

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Add support for radial search in exact search (#2174)

* Add support for radial search in exact search

When threshold value is set, knn plugin will not be creating graph.
Hence, when search request is trigged during that time, exact search
will return valid results. However, radial search was never included
as part of exact search. This will break radial search when threshold
is added and radial search is requested. In this commit, new method
is introduced to accept min score and return documents that are greater
than min score, similar to how radial search is performed by native
engines. This search is independent of engine, but, radial search is
supported only for FAISS engine out of all native engines.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 10, 2024
…t search when there are no engine files (#2188)

* Introduce new setting to configure when to build graph during segment creation (#2007)

Added new updatable index setting "build_vector_data_structure_threshold", which will be
considered when to build braph or not for native engines.
This is noop for lucene. This depends on use lucene format as prerequisite.
We don't need to add flag since it is only enable if lucene format is
already enabled.

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Add integration test for binary vector values (#2142)

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Allow build graph greedily for quantization scenarios (#2175)

Previosuly we only added support to build greedily for
non quantization scenario. In this commit, we can remove
that constraint, however, we cannot skip writing quanitization
state since it is required irrespective of type of search
is executed later.

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Add exact search if no native engine files are available (#2136)

* Add exact search if no engine files are in segments

When graph is not available, plugin will return empty results. With this change,
exact search will be performed when only no engine file is available in segment.
We also don't need version check or feature flag because, option to not build vector
data structure will only be available post 2.17.
If an index is created using pre 2.17 version, segment will always have engine files
and this feature will never be called during search.

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Add support for radial search in exact search (#2174)

* Add support for radial search in exact search

When threshold value is set, knn plugin will not be creating graph.
Hence, when search request is trigged during that time, exact search
will return valid results. However, radial search was never included
as part of exact search. This will break radial search when threshold
is added and radial search is requested. In this commit, new method
is introduced to accept min score and return documents that are greater
than min score, similar to how radial search is performed by native
engines. This search is independent of engine, but, radial search is
supported only for FAISS engine out of all native engines.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
(cherry picked from commit 5a56829)
VijayanB added a commit that referenced this pull request Oct 10, 2024
…nd perform exact search when there are no engine files (#2201)

* Add support to build vector data structures greedily and perform exact search when there are no engine files (#2188)

* Introduce new setting to configure when to build graph during segment creation (#2007)

Added new updatable index setting "build_vector_data_structure_threshold", which will be
considered when to build braph or not for native engines.
This is noop for lucene. This depends on use lucene format as prerequisite.
We don't need to add flag since it is only enable if lucene format is
already enabled.

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Add integration test for binary vector values (#2142)

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Allow build graph greedily for quantization scenarios (#2175)

Previosuly we only added support to build greedily for
non quantization scenario. In this commit, we can remove
that constraint, however, we cannot skip writing quanitization
state since it is required irrespective of type of search
is executed later.

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Add exact search if no native engine files are available (#2136)

* Add exact search if no engine files are in segments

When graph is not available, plugin will return empty results. With this change,
exact search will be performed when only no engine file is available in segment.
We also don't need version check or feature flag because, option to not build vector
data structure will only be available post 2.17.
If an index is created using pre 2.17 version, segment will always have engine files
and this feature will never be called during search.

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* Add support for radial search in exact search (#2174)

* Add support for radial search in exact search

When threshold value is set, knn plugin will not be creating graph.
Hence, when search request is trigged during that time, exact search
will return valid results. However, radial search was never included
as part of exact search. This will break radial search when threshold
is added and radial search is requested. In this commit, new method
is introduced to accept min score and return documents that are greater
than min score, similar to how radial search is performed by native
engines. This search is independent of engine, but, radial search is
supported only for FAISS engine out of all native engines.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
(cherry picked from commit 5a56829)

* Fix compilation issue due to package error

Signed-off-by: Vijayan Balasubramanian <[email protected]>

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
Co-authored-by: Vijayan Balasubramanian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants