Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNDB-12456: Add ANN_OPTIONS to CQL SELECT queries #1525

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

adelapena
Copy link

@adelapena adelapena commented Jan 22, 2025

Add ANN_OPTIONS to CQL SELECT queries, as described in this document: https://docs.google.com/document/d/1Q2SkE1aBkcy25DnCWkdUqpCf8rZA6R_AOU6bbORbarI

The proposed syntax is:

SELECT * FROM t 
   ORDER BY v ANN OF [1.0] 
   LIMIT 10 
   WITH ann_options = {'rerank_k': 10};

The new options and not something specific to SAI, same as the ORDER BY col ANN OF val part.

Internally, the ANN options are an attribute of the RowFilter included in every ReadCommand. Every index implementation receives them as part of the ReadCommand passed in methods such as Index.validate(ReadCommand), Index.Group.queryPlanFor(RowFilter) or Index.QueryPlan.searcherFor(ReadCommand).

Currently SAI rejects queries with ANN options. I think consuming those options is something we can do in a separate ticket, keeping this one focused on the CQL and internode messaging part.

This patch increases the internode messaging version because the ANN options have to be included in the serialization of the ReadCommand they belong to. Hopefully we won't need to bump the messaging version again if we add new ANN options.

I have chosen to include the options in the RowFilter because it seemed the cleaner approach, and so possibly less problematic when rebasing on Apache. The downside is that it adds a 32-bit int to the serialization of every SELECT query. An alternative, possibly more convoluted approach to save us those 4 extra bytes could be placing the ANN options inside the relevant ANN RowFilter.Expression. @jbellis @ekaterinadimitrova2 should I give a go to this alternative approach?

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits

@adelapena adelapena self-assigned this Jan 22, 2025
@adelapena adelapena marked this pull request as draft January 22, 2025 12:16
@adelapena adelapena marked this pull request as ready for review January 22, 2025 14:10
@adelapena adelapena marked this pull request as draft January 22, 2025 14:10
@adelapena adelapena marked this pull request as ready for review January 24, 2025 01:09
@adelapena adelapena marked this pull request as draft January 24, 2025 01:10
@adelapena adelapena marked this pull request as ready for review January 24, 2025 12:19
Only the ANN expression will have the serialized options
@adelapena
Copy link
Author

The last commit tries the alternative approach mentioned above, placing the new ANNOptions in the ANN RowFilter.Expression rather than in RowFilter to save us some serialization. It turns out it's not as noisy as I expected. Also, it more or less follows the steps of CNDB-10731 so it won't make us diverge from Apache much more than we already have.

* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.cassandra.index;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this class would be better placed in the org.apache.cassandra.db.filter, same as RowFilter?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@ekaterinadimitrova2
Copy link

Currently SAI rejects queries with ANN options. I think consuming those options is something we can do in a separate ticket, keeping this one focused on the CQL and internode messaging part.

Agreed

This patch increases the internode messaging version because the ANN options have to be included in the serialization of the ReadCommand they belong to. Hopefully we won't need to bump the messaging version again if we add new ANN options.

The flag makes sense to me

The last commit tries the alternative approach mentioned above, placing the new ANNOptions in the ANN RowFilter.Expression rather than in RowFilter to save us some serialization. It turns out it's not as noisy as I expected. Also, it more or less follows the steps of CNDB-10731 so it won't make us diverge from Apache much more than we already have.

I like the last version - placing the new ANNOptions in the ANN RowFilter.Expression rather than in RowFilter. I skimmed on a high level, though, and I did not get into details. Let me know when you are ready for review.

@adelapena
Copy link
Author

Thanks for the feedback. The patch is ready for review.

@cassci-bot
Copy link

✔️ Build ds-cassandra-pr-gate/PR-1525 approved by Butler


Approved by Butler
See build details here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants