Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add two new "Seeded" Knn queries for seeded vector search #14084

Merged
merged 47 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from 45 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
60e3fab
implement seeded knn queries
Aug 6, 2024
82e7053
cleanup
Aug 6, 2024
7955148
ensure seed docs have a vector
Aug 6, 2024
40d972d
apply filter to seed queries
Aug 6, 2024
f36a4cd
Update lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader…
seanmacavaney Sep 5, 2024
539b29a
Update lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader…
seanmacavaney Sep 5, 2024
3df6ad2
Update lucene/core/src/java/org/apache/lucene/search/KnnByteVectorQue…
seanmacavaney Sep 5, 2024
0508d87
Update lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSear…
seanmacavaney Sep 5, 2024
732f69c
Update lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSear…
seanmacavaney Sep 5, 2024
c02b4cc
Update lucene/core/src/java/org/apache/lucene/search/AbstractKnnVecto…
seanmacavaney Sep 5, 2024
285ebfe
Update lucene/core/src/java/org/apache/lucene/search/AbstractKnnVecto…
seanmacavaney Sep 5, 2024
b64d458
Merge branch 'main' into seeds
seanmacavaney Sep 5, 2024
9f1be67
mapping docIds to ordinals
Sep 10, 2024
244f46b
fixed test warning
Sep 10, 2024
b73e7a3
fix test warning
Sep 10, 2024
3134132
tidy
Sep 10, 2024
69db4d4
Update lucene/core/src/java/org/apache/lucene/search/AbstractKnnVecto…
seanmacavaney Sep 26, 2024
fe4bef3
Update lucene/core/src/java/org/apache/lucene/search/KnnByteVectorQue…
seanmacavaney Sep 26, 2024
8e044f8
Update lucene/core/src/java/org/apache/lucene/search/KnnFloatVectorQu…
seanmacavaney Sep 26, 2024
33231b3
Update lucene/core/src/java/org/apache/lucene/index/FloatVectorValues…
seanmacavaney Sep 26, 2024
2e86e4f
Update lucene/core/src/java/org/apache/lucene/index/ByteVectorValues.…
seanmacavaney Sep 26, 2024
c0c18b2
Update lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSear…
seanmacavaney Sep 26, 2024
6190aca
Update lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSear…
seanmacavaney Sep 26, 2024
a49ba2f
address review comments
Sep 26, 2024
8698b59
Merge branch 'main' into seeds
seanmacavaney Sep 26, 2024
1f8a9f4
merge issues
Sep 26, 2024
58f34df
addresses review comments
Sep 27, 2024
fc2129f
refactor wip
Oct 2, 2024
e8417d3
consistent naming
Oct 2, 2024
440b0d0
javadoc
Oct 2, 2024
87e75ab
tidy
Oct 2, 2024
7b3350f
javadoc typo
Oct 2, 2024
5bb40c2
javadoc
Oct 2, 2024
216bfc4
test fixes
Oct 2, 2024
bccf15d
Merge branch 'main' into seeds
seanmacavaney Oct 2, 2024
b6725c7
merge resolution
Oct 2, 2024
0cfd99b
merging
Oct 2, 2024
dd63bb0
refactor as decorator
Oct 2, 2024
1635870
javadoc
Oct 2, 2024
c8a512a
apply decorator elsewhere
Oct 2, 2024
3cfe678
Merge branch 'main' into seeds
benwtrent Dec 19, 2024
cb6ab54
Refactor
benwtrent Dec 20, 2024
6d0cb4f
removing unnecessary changes
benwtrent Dec 20, 2024
4635ab8
Merge remote-tracking branch 'upstream/main' into seeds
benwtrent Jan 6, 2025
04289ac
adding changes & address PR comments & fixing tests
benwtrent Jan 6, 2025
153ce6c
adding checks for vector value types
benwtrent Jan 8, 2025
2386034
Merge remote-tracking branch 'upstream/main' into seeds
benwtrent Jan 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion lucene/CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,11 @@ API Changes

New Features
---------------------
(No changes)

* GITHUB#14084, GITHUB#13635, GITHUB#13634: Adds new `SeededKnnByteVectorQuery` and `SeededKnnFloatVectorQuery`
queries. These queries allow for the vector search entry points to be initialized via a `seed` query. This follows
the research provided via https://arxiv.org/abs/2307.16779. (Sean MacAvaney, Ben Trent).


Improvements
---------------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ public class KnnByteVectorQuery extends AbstractKnnVectorQuery {

private static final TopDocs NO_RESULTS = TopDocsCollector.EMPTY_TOPDOCS;

private final byte[] target;
protected final byte[] target;

/**
* Find the <code>k</code> nearest documents to the target vector according to the vectors in the
Expand Down
54 changes: 54 additions & 0 deletions lucene/core/src/java/org/apache/lucene/search/KnnCollector.java
Original file line number Diff line number Diff line change
Expand Up @@ -85,4 +85,58 @@ public interface KnnCollector {
* @return The collected top documents
*/
TopDocs topDocs();

/**
* KnnCollector.Decorator is the base class for decorators of KnnCollector objects, which extend
* the object with new behaviors.
*
* @lucene.experimental
*/
abstract class Decorator implements KnnCollector {
private final KnnCollector collector;

public Decorator(KnnCollector collector) {
this.collector = collector;
}

@Override
public boolean earlyTerminated() {
return collector.earlyTerminated();
}

@Override
public void incVisitedCount(int count) {
collector.incVisitedCount(count);
}

@Override
public long visitedCount() {
return collector.visitedCount();
}

@Override
public long visitLimit() {
return collector.visitLimit();
}

@Override
public int k() {
return collector.k();
}

@Override
public boolean collect(int docId, float similarity) {
return collector.collect(docId, similarity);
}

@Override
public float minCompetitiveSimilarity() {
return collector.minCompetitiveSimilarity();
}

@Override
public TopDocs topDocs() {
return collector.topDocs();
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ public class KnnFloatVectorQuery extends AbstractKnnVectorQuery {

private static final TopDocs NO_RESULTS = TopDocsCollector.EMPTY_TOPDOCS;

private final float[] target;
protected final float[] target;

/**
* Find the <code>k</code> nearest documents to the target vector according to the vectors in the
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.lucene.search;

import java.io.IOException;
import java.util.Objects;
import org.apache.lucene.search.knn.KnnCollectorManager;
import org.apache.lucene.search.knn.SeededKnnCollectorManager;

/**
* This is a version of knn byte vector query that provides a query seed to initiate the vector
* search. NOTE: The underlying format is free to ignore the provided seed
*
* <p>See <a href="https://dl.acm.org/doi/10.1145/3539618.3591715">"Lexically-Accelerated Dense
* Retrieval"</a> (Kulkarni, Hrishikesh and MacAvaney, Sean and Goharian, Nazli and Frieder, Ophir).
* In SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and
* Development in Information Retrieval Pages 152 - 162
*
* @lucene.experimental
*/
public class SeededKnnByteVectorQuery extends KnnByteVectorQuery {
final Query seed;
final Weight seedWeight;

/**
* Construct a new SeededKnnByteVectorQuery instance
*
* @param field knn byte vector field to query
* @param target the query vector
* @param k number of neighbors to return
* @param filter a filter on the neighbors to return
* @param seed a query seed to initiate the vector format search
*/
public SeededKnnByteVectorQuery(String field, byte[] target, int k, Query filter, Query seed) {
super(field, target, k, filter);
this.seed = Objects.requireNonNull(seed);
this.seedWeight = null;
}

SeededKnnByteVectorQuery(String field, byte[] target, int k, Query filter, Weight seedWeight) {
super(field, target, k, filter);
this.seed = null;
this.seedWeight = Objects.requireNonNull(seedWeight);
}

@Override
public Query rewrite(IndexSearcher indexSearcher) throws IOException {
if (seedWeight != null) {
return super.rewrite(indexSearcher);
}
BooleanQuery.Builder booleanSeedQueryBuilder =
new BooleanQuery.Builder()
.add(seed, BooleanClause.Occur.MUST)
.add(new FieldExistsQuery(field), BooleanClause.Occur.FILTER);
if (filter != null) {
booleanSeedQueryBuilder.add(filter, BooleanClause.Occur.FILTER);
}
Query seedRewritten = indexSearcher.rewrite(booleanSeedQueryBuilder.build());
Weight seedWeight = indexSearcher.createWeight(seedRewritten, ScoreMode.TOP_SCORES, 1f);
SeededKnnByteVectorQuery rewritten =
new SeededKnnByteVectorQuery(field, target, k, filter, seedWeight);
return rewritten.rewrite(indexSearcher);
}

@Override
protected KnnCollectorManager getKnnCollectorManager(int k, IndexSearcher searcher) {
if (seedWeight == null) {
throw new UnsupportedOperationException("must be rewritten before constructing manager");
}
return new SeededKnnCollectorManager(
super.getKnnCollectorManager(k, searcher),
seedWeight,
k,
leaf -> leaf.getFloatVectorValues(field));
benwtrent marked this conversation as resolved.
Show resolved Hide resolved
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.lucene.search;

import java.io.IOException;
import java.util.Objects;
import org.apache.lucene.search.knn.KnnCollectorManager;
import org.apache.lucene.search.knn.SeededKnnCollectorManager;

/**
* This is a version of knn float vector query that provides a query seed to initiate the vector
* search. NOTE: The underlying format is free to ignore the provided seed.
*
* <p>See <a href="https://dl.acm.org/doi/10.1145/3539618.3591715">"Lexically-Accelerated Dense
* Retrieval"</a> (Kulkarni, Hrishikesh and MacAvaney, Sean and Goharian, Nazli and Frieder, Ophir).
* In SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and
* Development in Information Retrieval Pages 152 - 162
*
* @lucene.experimental
*/
public class SeededKnnFloatVectorQuery extends KnnFloatVectorQuery {
final Query seed;
final Weight seedWeight;

/**
* Construct a new SeededKnnFloatVectorQuery instance
*
* @param field knn float vector field to query
* @param target the query vector
* @param k number of neighbors to return
* @param filter a filter on the neighbors to return
* @param seed a query seed to initiate the vector format search
*/
public SeededKnnFloatVectorQuery(String field, float[] target, int k, Query filter, Query seed) {
super(field, target, k, filter);
this.seed = Objects.requireNonNull(seed);
this.seedWeight = null;
}

SeededKnnFloatVectorQuery(String field, float[] target, int k, Query filter, Weight seedWeight) {
super(field, target, k, filter);
this.seed = null;
this.seedWeight = Objects.requireNonNull(seedWeight);
}

@Override
public Query rewrite(IndexSearcher indexSearcher) throws IOException {
if (seedWeight != null) {
return super.rewrite(indexSearcher);
}
BooleanQuery.Builder booleanSeedQueryBuilder =
new BooleanQuery.Builder()
.add(seed, BooleanClause.Occur.MUST)
.add(new FieldExistsQuery(field), BooleanClause.Occur.FILTER);
if (filter != null) {
booleanSeedQueryBuilder.add(filter, BooleanClause.Occur.FILTER);
}
Query seedRewritten = indexSearcher.rewrite(booleanSeedQueryBuilder.build());
Weight seedWeight = indexSearcher.createWeight(seedRewritten, ScoreMode.TOP_SCORES, 1f);
SeededKnnFloatVectorQuery rewritten =
new SeededKnnFloatVectorQuery(field, target, k, filter, seedWeight);
return rewritten.rewrite(indexSearcher);
}

@Override
protected KnnCollectorManager getKnnCollectorManager(int k, IndexSearcher searcher) {
if (seedWeight == null) {
throw new UnsupportedOperationException("must be rewritten before constructing manager");
}
return new SeededKnnCollectorManager(
super.getKnnCollectorManager(k, searcher),
seedWeight,
k,
leaf -> leaf.getFloatVectorValues(field));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -45,51 +45,19 @@ public KnnCollector newCollector(int visitedLimit, LeafReaderContext context) th
return new TimeLimitingKnnCollector(collector);
}

class TimeLimitingKnnCollector implements KnnCollector {
private final KnnCollector collector;

TimeLimitingKnnCollector(KnnCollector collector) {
this.collector = collector;
class TimeLimitingKnnCollector extends KnnCollector.Decorator {
public TimeLimitingKnnCollector(KnnCollector collector) {
super(collector);
}

@Override
public boolean earlyTerminated() {
return queryTimeout.shouldExit() || collector.earlyTerminated();
}

@Override
public void incVisitedCount(int count) {
collector.incVisitedCount(count);
}

@Override
public long visitedCount() {
return collector.visitedCount();
}

@Override
public long visitLimit() {
return collector.visitLimit();
}

@Override
public int k() {
return collector.k();
}

@Override
public boolean collect(int docId, float similarity) {
return collector.collect(docId, similarity);
}

@Override
public float minCompetitiveSimilarity() {
return collector.minCompetitiveSimilarity();
return queryTimeout.shouldExit() || super.earlyTerminated();
}

@Override
public TopDocs topDocs() {
TopDocs docs = collector.topDocs();
TopDocs docs = super.topDocs();

// Mark results as partial if timeout is met
TotalHits.Relation relation =
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.lucene.search.knn;

import org.apache.lucene.search.DocIdSetIterator;

/** Provides entry points for the kNN search */
public interface EntryPointProvider {
/** Iterator of valid entry points for the kNN search */
DocIdSetIterator entryPoints();

/** Number of valid entry points for the kNN search */
int numberOfEntryPoints();
}
Loading
Loading