Switch nightly benchy to more realistic `Cohere/wikipedia-22-12-en-embeddings` vectors #256

mikemccand · 2024-03-05T17:54:46Z

#255 added realistic Cohere/wikipedia-22-12-en-embeddings 768 dim vectors to luceneutil -- let's switch over nightlies to use these vectors instead.

The text was updated successfully, but these errors were encountered:

mikemccand · 2024-03-10T20:57:46Z

I attempted to follow the README instructions to generate nightly benchy vectors, using this command:

python3 -u src/python/infer_token_vectors_cohere.py ../data/cohere-wikipedia-768.vec 27625038 ../data/cohere-wikipedia-queries-768.vec 10000

(Note that the nightly benchy only does indexing, so I really only need the first file)

But this consumes gobbs of RAM apparently and the Linux OOME killer killed it!

Is this expected? I can run this on a beefier machine if need be (current machine has "only" 256 GB and no swap) for this one-time generation of vectors ...

Maybe datasets.load_dataset can load just the N vectors I need, not everything in the train split?

mikemccand · 2024-03-10T21:08:12Z

Oooh this load_datasets method takes a parameter keep_in_memory! I'll poke around.

mikemccand · 2024-03-11T13:12:29Z

OK well that keep_in_memory=False parameter seemed to do nothing -- still OOME killer at 256 GB RAM.

With this change to do chunking into 1M blocks of vectors when writing the index-time inferred vectors I was able to run the tool!

Full output below:

beast3:util.nightly[master]$ python3 -u src/python/infer_token_vectors_cohere.py ../data/cohere-wikipedia-768.vec 27625038 ../data/cohere-wikipedia-queries-768.vec 10000
Resolving data files: 100%|████████████████████████████████████████████| 253/253 [00:01<00:00, 250.70it/s]
Loading dataset shards: 100%|█████████████████████████████████████████| 252/252 [00:00<00:00, 1121.66it/s]
total number of rows: 35167920
embeddings dims: 768
saving docs[0:1000000 of shape: (1000000, 768) to file
saving docs[1000000:2000000 of shape: (1000000, 768) to file
saving docs[2000000:3000000 of shape: (1000000, 768) to file
saving docs[3000000:4000000 of shape: (1000000, 768) to file
saving docs[4000000:5000000 of shape: (1000000, 768) to file
saving docs[5000000:6000000 of shape: (1000000, 768) to file
saving docs[6000000:7000000 of shape: (1000000, 768) to file
saving docs[7000000:8000000 of shape: (1000000, 768) to file
saving docs[8000000:9000000 of shape: (1000000, 768) to file
saving docs[9000000:10000000 of shape: (1000000, 768) to file
saving docs[10000000:11000000 of shape: (1000000, 768) to file
saving docs[11000000:12000000 of shape: (1000000, 768) to file
saving docs[12000000:13000000 of shape: (1000000, 768) to file
saving docs[13000000:14000000 of shape: (1000000, 768) to file
saving docs[14000000:15000000 of shape: (1000000, 768) to file
saving docs[15000000:16000000 of shape: (1000000, 768) to file
saving docs[16000000:17000000 of shape: (1000000, 768) to file
saving docs[17000000:18000000 of shape: (1000000, 768) to file
saving docs[18000000:19000000 of shape: (1000000, 768) to file
saving docs[19000000:20000000 of shape: (1000000, 768) to file
saving docs[20000000:21000000 of shape: (1000000, 768) to file
saving docs[21000000:22000000 of shape: (1000000, 768) to file
saving docs[22000000:23000000 of shape: (1000000, 768) to file
saving docs[23000000:24000000 of shape: (1000000, 768) to file
saving docs[24000000:25000000 of shape: (1000000, 768) to file
saving docs[25000000:26000000 of shape: (1000000, 768) to file
saving docs[26000000:27000000 of shape: (1000000, 768) to file
saving docs[27000000:27625038 of shape: (625038, 768) to file
saving queries of shape: (10000, 768) to file
reading docs of shape: (27625038, 768)
reading queries shape: (10000, 768)

It produced a large .vec file:

beast3:util.nightly[master]$ ls -lh ../data/cohere-wikipedia-768.vec
-rw-r--r-- 1 mike mike 159G Mar 10 22:13 ../data/cohere-wikipedia-768.vec

Next I'll try switching to this source for nightly benchy. I'll also publish this on home.apache.org.

mikemccand · 2024-03-11T13:14:06Z

Hmm, except, that file is too large?

beast3:util.nightly[master]$ python3
Python 3.11.7 (main, Jan 29 2024, 16:03:57) [GCC 13.2.1 20230801] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 27000000 * 768 * 4 / 1024 / 1024 / 1024
77.24761962890625

It's 159 GB but should be ~77 GB?

Maybe my "chunking" is buggy :)

mikemccand · 2024-03-11T19:18:05Z

OK I think these are float64 typed vectors, in which case the file size makes sense. But I think nightly benchy wants float32?

mikemccand · 2024-03-11T23:11:08Z

And I think knnPerfTest.py/KnnGraphTester.java also wants float32? I'm confused how they are working now on the generated file ...

mikemccand · 2024-03-28T12:43:37Z

Oooh this Dataset.cast method looks promising! I'll explore...

mikemccand · 2024-03-28T14:24:55Z

OK I made this change and kicked off infer_token_vectors_cohere.py again and it looks to at least be running...:

diff --git a/src/python/infer_token_vectors_cohere.py b/src/python/infer_token_vectors_cohere.py
index 5c350df..5027eb2 100644
--- a/src/python/infer_token_vectors_cohere.py
+++ b/src/python/infer_token_vectors_cohere.py
@@ -28,11 +28,19 @@ for name in (filename, filename_queries):

 ds = datasets.load_dataset("Cohere/wikipedia-22-12-en-embeddings",
                            split="train")
+print(f'features: {ds.features}')
 print(f"total number of rows: {len(ds)}")
 print(f"embeddings dims: {len(ds[0]['emb'])}")

 # ds = ds[:num_docs]

+# we just want the vector embeddings:
+for feature_name in ds.features.keys():
+  if feature_name != 'emb':
+    ds = ds.remove_columns(feature_name)
+
+ds = ds.cast(datasets.Features({'emb': datasets.Sequence(feature=datasets.Value("float32"))}))
+
 # do this in windows, else the RAM usage is crazy (OOME even with 256
 # GB RAM since I think this step makes 2X copy of the dataset?)
 doc_upto = 0

mikemccand · 2024-03-28T14:28:54Z

OK hmm scratch that, I see from the already loaded features that Dataset thinks these emb vectors are already float32:

features: {'id': Value(dtype='int32', id=None), 'title': Value(dtype='string', id=None), 'text': Value(dtype='string', id=None), 'url': Value(dtype='string', id=None), 'wiki_id': Value(dtype='int32', id=None), 'views': Value(dtype='float32', id=None), 'paragraph_id': Value(dtype='int32', id=None), 'langs': Value(dtype='int32', id=None), 'emb': Sequence(feature=Value(dtype='float32', id=None), length=-1, id=None)}

mikemccand · 2024-03-28T14:42:18Z

OK! Now I think the issue is in np.array -- I think we have to give it a preferred data type, else, it seems to be casting the Dataset's float32 up to float64, maybe.

So, now I'm testing this:

diff --git a/src/python/infer_token_vectors_cohere.py b/src/python/infer_token_vectors_cohere.py
index 5c350df..4cc305e 100644
--- a/src/python/infer_token_vectors_cohere.py
+++ b/src/python/infer_token_vectors_cohere.py
@@ -28,11 +28,20 @@ for name in (filename, filename_queries):

 ds = datasets.load_dataset("Cohere/wikipedia-22-12-en-embeddings",
                            split="train")
+print(f'features: {ds.features}')
 print(f"total number of rows: {len(ds)}")
 print(f"embeddings dims: {len(ds[0]['emb'])}")

 # ds = ds[:num_docs]

+if False:
+  # we just want the vector embeddings:
+  for feature_name in ds.features.keys():
+    if feature_name != 'emb':
+      ds = ds.remove_columns(feature_name)
+
+  ds = ds.cast(datasets.Features({'emb': datasets.Sequence(feature=datasets.Value("float32"))}))
+
 # do this in windows, else the RAM usage is crazy (OOME even with 256
 # GB RAM since I think this step makes 2X copy of the dataset?)
 doc_upto = 0
@@ -40,7 +49,7 @@ window_num_docs = 1000000
 while doc_upto < num_docs:
   next_doc_upto = min(doc_upto + window_num_docs, num_docs)
   ds_embs = ds[doc_upto:next_doc_upto]['emb']
-  embs = np.array(ds_embs)
+  embs = np.array(ds_embs, dtype=np.single)
   print(f"saving docs[{doc_upto}:{next_doc_upto} of shape: {embs.shape} to file")
   with open(filename, "ab") as out_f:
       embs.tofile(out_f)

…-en-embeddings vectors

mikemccand · 2024-04-29T11:48:54Z

OK the above change seemed to have worked (I just pushed it)! I now see these vector files:

-rw-r--r-- 1 mike mike  80G Mar 28 12:57 cohere-wikipedia-768.vec
-rw-r--r-- 1 mike mike 586M Mar 28 12:57 cohere-wikipedia-queries-768.vec

Now I will try to confirm their recall seems sane, and then switch nightly to them.

mikemccand · 2024-04-29T12:26:32Z

OK I think the next wrinkle here is ... to fix SearchPerfTest to use the pre-computed Cohere query vectors from cohere-wikipedia-queries-768.vec, instead of attempting to do inference based on the lexical tokens of each incoming query. I guess we could just incrementally pull the vectors from the query vectors file and assign them sequentially to each vector query we see? @msokolov does that sound reasonable?

msokolov · 2024-04-29T22:12:20Z

I think we can modify VectorDictionary so accept a --no-tokenize option and then lookup the vector using the full query text? We would need to generate a text file with the queries, one per line, to correspond with the binary vector file.

msokolov · 2024-04-30T18:36:51Z

otherwise you could simply select some random vector every time you see a vector-type query task?? But I would expect some vectors behave differently from others? Not sure

mikemccand · 2024-06-10T15:32:10Z

I was finally able to index/search using these Cohere vectors, and the profiler output is sort of strange:

This is CPU:

PROFILE SUMMARY from 44698 events (total: 44698)
  tests.profile.mode=cpu
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       CPU SAMPLES   STACK
10.98%        4907          jdk.internal.misc.ScopedMemoryAccess#getByteInternal()
6.05%         2702          jdk.incubator.vector.FloatVector#reduceLanesTemplate()
4.06%         1813          org.apache.lucene.store.MemorySegmentIndexInput#readByte()
3.63%         1622          perf.PKLookupTask#go()
2.89%         1292          org.apache.lucene.store.DataInput#readVInt()
2.84%         1269          org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel()
2.58%         1153          org.apache.lucene.util.fst.FST#findTargetArc()
2.45%         1093          jdk.incubator.vector.FloatVector#fromArray0Template()
2.28%         1018          org.apache.lucene.util.LongHeap#downHeap()
2.25%         1007          org.apache.lucene.util.SparseFixedBitSet#insertLong()
2.06%         921           org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#seekExact()
2.05%         916           jdk.internal.foreign.AbstractMemorySegmentImpl#checkBounds()
1.96%         875           jdk.incubator.vector.FloatVector#lanewiseTemplate()
1.91%         852           jdk.internal.util.ArraysSupport#mismatch()
1.40%         627           org.apache.lucene.util.compress.LZ4#decompress()
1.31%         586           jdk.internal.foreign.MemorySessionImpl#checkValidStateRaw()
1.18%         526           org.apache.lucene.util.SparseFixedBitSet#getAndSet()
1.15%         516           org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsReader$OffHeapHnswGraph#seek()
0.99%         444           org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState#doReset()
0.99%         441           org.apache.lucene.util.BytesRef#compareTo()
0.94%         418           org.apache.lucene.util.fst.FST#readArcByDirectAddressing()
0.92%         413           org.apache.lucene.search.TopKnnCollector#topDocs()
0.91%         408           org.apache.lucene.index.VectorSimilarityFunction$2#compare()
0.90%         401           org.apache.lucene.codecs.hnsw.DefaultFlatVectorScorer$FloatVectorScorer#score()
0.80%         359           org.apache.lucene.index.SegmentInfo#maxDoc()
0.79%         353           java.util.Arrays#fill()
0.75%         334           org.apache.lucene.codecs.lucene99.Lucene99PostingsReader#decodeTerm()
0.73%         326           java.util.Arrays#compareUnsigned()
0.72%         324           org.apache.lucene.search.ReferenceManager#acquire()
0.71%         317           org.apache.lucene.store.DataInput#readVLong()

and this is HEAP:

PROFILE SUMMARY from 748 events (total: 38182M)
  tests.profile.mode=heap
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       HEAP SAMPLES  STACK
21.88%        8355M         java.util.concurrent.locks.AbstractQueuedSynchronizer#acquire()
13.50%        5154M         org.apache.lucene.util.ArrayUtil#growNoCopy()
9.31%         3556M         org.apache.lucene.util.SparseFixedBitSet#insertLong()
9.00%         3436M         perf.StatisticsHelper#startStatistics()
9.00%         3436M         java.util.ArrayList#iterator()
5.76%         2199M         org.apache.lucene.util.fst.ByteSequenceOutputs#read()
3.60%         1374M         org.apache.lucene.util.BytesRef#<init>()
3.56%         1357M         org.apache.lucene.codecs.lucene95.OffHeapFloatVectorValues#<init>()
3.52%         1345M         org.apache.lucene.util.ArrayUtil#growExact()
2.65%         1013M         org.apache.lucene.search.TopKnnCollector#topDocs()
2.50%         956M          java.util.concurrent.locks.AbstractQueuedSynchronizer#tryInitializeHead()
2.34%         893M          org.apache.lucene.util.SparseFixedBitSet#insertBlock()
1.98%         755M          org.apache.lucene.util.LongHeap#<init>()
1.51%         578M          java.util.logging.LogManager#reset()
1.51%         578M          java.util.concurrent.FutureTask#runAndReset()
1.51%         578M          jdk.jfr.internal.ShutdownHook#run()
1.21%         463M          jdk.internal.foreign.MappedMemorySegmentImpl#dup()
0.90%         343M          java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#newConditionNode()
0.83%         315M          org.apache.lucene.util.SparseFixedBitSet#<init>()
0.60%         229M          org.apache.lucene.util.hnsw.FloatHeap#<init>()
0.52%         200M          org.apache.lucene.util.hnsw.FloatHeap#getHeap()
0.45%         171M          jdk.internal.misc.Unsafe#allocateUninitializedArray()
0.45%         171M          org.apache.lucene.util.packed.DirectMonotonicReader#getInstance()
0.45%         171M          org.apache.lucene.store.DataInput#readString()
0.22%         85M           org.apache.lucene.search.knn.TopKnnCollectorManager#newCollector()
0.22%         85M           org.apache.lucene.search.knn.MultiLeafKnnCollector#<init>()
0.15%         57M           org.apache.lucene.store.MemorySegmentIndexInput#buildSlice()
0.08%         28M           perf.TaskParser$TaskBuilder#parseVectorQuery()
0.07%         28M           java.util.regex.Pattern#matcher()
0.07%         28M           org.apache.lucene.search.TaskExecutor$TaskGroup#createTask()

Why are we reading individual bytes so intensively? And why is lock acquisition the top HEAP object creator!?

mikemccand · 2024-06-10T15:32:38Z

Here's the perf.py I ran (just A/A):

import sys
sys.path.insert(0, '/l/util/src/python')

import competition

if __name__ == '__main__':
  sourceData = competition.sourceData('wikimediumall')

  sourceData.tasksFile = '/l/util/just-vector-search.tasks'
  comp = competition.Competition(taskRepeatCount=200)
  #comp.addTaskPattern('HighTerm$')                                                                                                                                                                    

  checkout = 'trunk'

  index = comp.newIndex(checkout, sourceData, numThreads=36, addDVFields=True,
                        grouping=False, useCMS=True,
                        #javaCommand='/opt/jdk-18-ea-28/bin/java --add-modules jdk.incubator.foreign -Xmx32g -Xms32g -server -XX:+UseParallelGC -Djava.io.tmpdir=/l/tmp',                              
                        ramBufferMB=256,
                        analyzer = 'StandardAnalyzerNoStopWords',
                        vectorFile = '/lucenedata/enwiki/cohere-wikipedia-768.vec',
                        vectorDimension = 768,
                        hnswThreadsPerMerge = 4,
                        hnswThreadPoolCount = 16,
                        vectorEncoding = 'FLOAT32',
                        verbose = True,
                        name = 'mikes-vector-test',
                        facets = (('taxonomy:Date', 'Date'),
                                  ('taxonomy:Month', 'Month'),
                                  ('taxonomy:DayOfYear', 'DayOfYear'),
                                  ('taxonomy:RandomLabel.taxonomy', 'RandomLabel'),
                                  ('sortedset:Date', 'Date'),
                                  ('sortedset:Month', 'Month'),
                                  ('sortedset:DayOfYear', 'DayOfYear'),
                                  ('sortedset:RandomLabel.sortedset', 'RandomLabel')))

  comp.competitor('base', checkout, index=index, vectorFileName='/lucenedata/enwiki/cohere-wikipedia-queries-768.vec', vectorDimension=768,
                  #javacCommand='/opt/jdk-18-ea-28/bin/javac',                                                                                                                                         
                  #javaCommand='/opt/jdk-18-ea-28/bin/java --add-modules jdk.incubator.foreign -Xmx32g -Xms32g -server -XX:+UseParallelGC -Djava.io.tmpdir=/l/tmp')                                    
                  )
  comp.competitor('comp', checkout, index=index, vectorFileName='/lucenedata/enwiki/cohere-wikipedia-queries-768.vec', vectorDimension=768,
                  #javacCommand='/opt/jdk-18-ea-28/bin/javac',                                                                                                                                         
                  #javaCommand='/opt/jdk-18-ea-28/bin/java --add-modules jdk.incubator.foreign -Xmx32g -Xms32g -server -XX:+UseParallelGC -Djava.io.tmpdir=/l/tmp')                                    
                  )
  comp.benchmark('atoa')

mikemccand · 2024-06-10T18:12:21Z

More thread context for the CPU profiling:

PROFILE SUMMARY from 10264 events (total: 10264)
  tests.profile.mode=cpu
  tests.profile.count=50
  tests.profile.stacksize=8
  tests.profile.linenumbers=false
PERCENT       CPU SAMPLES   STACK
12.59%        1292          jdk.incubator.vector.FloatVector#reduceLanesTemplate()
                              at jdk.incubator.vector.Float256Vector#reduceLanes()
                              at org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody()
                              at org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProduct()
                              at org.apache.lucene.util.VectorUtil#dotProduct()
                              at org.apache.lucene.index.VectorSimilarityFunction$2#compare()
                              at org.apache.lucene.codecs.hnsw.DefaultFlatVectorScorer$FloatVectorScorer#score()
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel()
7.16%         735           org.apache.lucene.store.DataInput#readVInt()
                              at org.apache.lucene.store.MemorySegmentIndexInput#readVInt()
                              at org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsReader$OffHeapHnswGraph#seek()
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#graphSeek()
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel()
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#search()
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#search()
                              at org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsReader#search()
4.25%         436           jdk.incubator.vector.FloatVector#lanewiseTemplate()
                              at jdk.incubator.vector.Float256Vector#lanewise()
                              at jdk.incubator.vector.Float256Vector#lanewise()
                              at jdk.incubator.vector.FloatVector#fma()
                              at org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#fma()
                              at org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody()
                              at org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProduct()
                              at org.apache.lucene.util.VectorUtil#dotProduct()
4.21%         432           jdk.internal.misc.ScopedMemoryAccess#getByteInternal()
                              at jdk.internal.misc.ScopedMemoryAccess#getByte()
                              at java.lang.invoke.VarHandleSegmentAsBytes#get()
                              at java.lang.invoke.VarHandleGuards#guard_LJ_I()
                              at java.lang.foreign.MemorySegment#get()
                              at org.apache.lucene.store.MemorySegmentIndexInput#readByte()
                              at org.apache.lucene.store.DataInput#readVInt()
                              at org.apache.lucene.store.MemorySegmentIndexInput#readVInt()
4.08%         419           org.apache.lucene.util.SparseFixedBitSet#insertLong()
                              at org.apache.lucene.util.SparseFixedBitSet#getAndSet()
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel()
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#search()
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#search()
                              at org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsReader#search()
                              at org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsReader#search()
                              at org.apache.lucene.index.CodecReader#searchNearestVectors()
2.84%         292           org.apache.lucene.util.LongHeap#downHeap()
                              at org.apache.lucene.util.LongHeap#pop()
                              at org.apache.lucene.util.hnsw.NeighborQueue#pop()
                              at org.apache.lucene.search.TopKnnCollector#topDocs()
                              at org.apache.lucene.search.knn.MultiLeafKnnCollector#topDocs()
                              at org.apache.lucene.search.KnnFloatVectorQuery#approximateSearch()
                              at org.apache.lucene.search.AbstractKnnVectorQuery#getLeafResults()
                              at org.apache.lucene.search.AbstractKnnVectorQuery#searchLeaf()
2.74%         281           org.apache.lucene.index.VectorSimilarityFunction$2#compare()
                              at org.apache.lucene.codecs.hnsw.DefaultFlatVectorScorer$FloatVectorScorer#score()
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel()
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#search()
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#search()
                              at org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsReader#search()
                              at org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsReader#search()
                              at org.apache.lucene.index.CodecReader#searchNearestVectors()
2.58%         265           org.apache.lucene.util.compress.LZ4#decompress()
                              at org.apache.lucene.codecs.lucene90.LZ4WithPresetDictCompressionMode$LZ4WithPresetDictDecompressor#decompress()
                              at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState#document()
                              at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader#serializedDocument()
                              at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader#document()
                              at org.apache.lucene.index.CodecReader$1#document()
                              at org.apache.lucene.index.BaseCompositeReader$2#document()
                              at org.apache.lucene.index.StoredFields#document()
2.48%         255           org.apache.lucene.util.SparseFixedBitSet#getAndSet()
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel()
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#search()
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#search()
                              at org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsReader#search()
                              at org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsReader#search()
                              at org.apache.lucene.index.CodecReader#searchNearestVectors()
                              at org.apache.lucene.search.KnnFloatVectorQuery#approximateSearch()

Curious that readVInt, when seeking to load a vector (?) is 2nd hotspot?

msokolov · 2024-06-10T18:15:20Z

VInts are used to encode the HNSW graph, so it looks like decoding the graph is where that is happening (vis at org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsReader$OffHeapHnswGraph#seek())

…

On Mon, Jun 10, 2024 at 2:12 PM Michael McCandless ***@***.***> wrote: More thread context for the CPU profiling: PROFILE SUMMARY from 10264 events (total: 10264) tests.profile.mode=cpu tests.profile.count=50 tests.profile.stacksize=8 tests.profile.linenumbers=false PERCENT CPU SAMPLES STACK 12.59% 1292 jdk.incubator.vector.FloatVector#reduceLanesTemplate() at jdk.incubator.vector.Float256Vector#reduceLanes() at org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody() at org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProduct() at org.apache.lucene.util.VectorUtil#dotProduct() at org.apache.lucene.index.VectorSimilarityFunction$2#compare() at org.apache.lucene.codecs.hnsw.DefaultFlatVectorScorer$FloatVectorScorer#score() at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() 7.16% 735 org.apache.lucene.store.DataInput#readVInt() at org.apache.lucene.store.MemorySegmentIndexInput#readVInt() at org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsReader$OffHeapHnswGraph#seek() at org.apache.lucene.util.hnsw.HnswGraphSearcher#graphSeek() at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() at org.apache.lucene.util.hnsw.HnswGraphSearcher#search() at org.apache.lucene.util.hnsw.HnswGraphSearcher#search() at org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsReader#search() 4.25% 436 jdk.incubator.vector.FloatVector#lanewiseTemplate() at jdk.incubator.vector.Float256Vector#lanewise() at jdk.incubator.vector.Float256Vector#lanewise() at jdk.incubator.vector.FloatVector#fma() at org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#fma() at org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody() at org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProduct() at org.apache.lucene.util.VectorUtil#dotProduct() 4.21% 432 jdk.internal.misc.ScopedMemoryAccess#getByteInternal() at jdk.internal.misc.ScopedMemoryAccess#getByte() at java.lang.invoke.VarHandleSegmentAsBytes#get() at java.lang.invoke.VarHandleGuards#guard_LJ_I() at java.lang.foreign.MemorySegment#get() at org.apache.lucene.store.MemorySegmentIndexInput#readByte() at org.apache.lucene.store.DataInput#readVInt() at org.apache.lucene.store.MemorySegmentIndexInput#readVInt() 4.08% 419 org.apache.lucene.util.SparseFixedBitSet#insertLong() at org.apache.lucene.util.SparseFixedBitSet#getAndSet() at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() at org.apache.lucene.util.hnsw.HnswGraphSearcher#search() at org.apache.lucene.util.hnsw.HnswGraphSearcher#search() at org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsReader#search() at org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsReader#search() at org.apache.lucene.index.CodecReader#searchNearestVectors() 2.84% 292 org.apache.lucene.util.LongHeap#downHeap() at org.apache.lucene.util.LongHeap#pop() at org.apache.lucene.util.hnsw.NeighborQueue#pop() at org.apache.lucene.search.TopKnnCollector#topDocs() at org.apache.lucene.search.knn.MultiLeafKnnCollector#topDocs() at org.apache.lucene.search.KnnFloatVectorQuery#approximateSearch() at org.apache.lucene.search.AbstractKnnVectorQuery#getLeafResults() at org.apache.lucene.search.AbstractKnnVectorQuery#searchLeaf() 2.74% 281 org.apache.lucene.index.VectorSimilarityFunction$2#compare() at org.apache.lucene.codecs.hnsw.DefaultFlatVectorScorer$FloatVectorScorer#score() at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() at org.apache.lucene.util.hnsw.HnswGraphSearcher#search() at org.apache.lucene.util.hnsw.HnswGraphSearcher#search() at org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsReader#search() at org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsReader#search() at org.apache.lucene.index.CodecReader#searchNearestVectors() 2.58% 265 org.apache.lucene.util.compress.LZ4#decompress() at org.apache.lucene.codecs.lucene90.LZ4WithPresetDictCompressionMode$LZ4WithPresetDictDecompressor#decompress() at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState#document() at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader#serializedDocument() at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader#document() at org.apache.lucene.index.CodecReader$1#document() at org.apache.lucene.index.BaseCompositeReader$2#document() at org.apache.lucene.index.StoredFields#document() 2.48% 255 org.apache.lucene.util.SparseFixedBitSet#getAndSet() at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() at org.apache.lucene.util.hnsw.HnswGraphSearcher#search() at org.apache.lucene.util.hnsw.HnswGraphSearcher#search() at org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsReader#search() at org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsReader#search() at org.apache.lucene.index.CodecReader#searchNearestVectors() at org.apache.lucene.search.KnnFloatVectorQuery#approximateSearch() Curious that readVInt, when seeking to load a vector (?) is 2nd hotspot? — Reply to this email directly, view it on GitHub <#256 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHHUQKVIMMHESS7RMPDGLDZGXUBXAVCNFSM6AAAAABEHTEKSOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJZGAYDAMBVGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

benwtrent · 2024-10-17T19:10:13Z

I might be missing it, but where is the similarity defined for using the cohere vectors? They are designed for max inner product, if we use euclidean, I would expect graph building and indexing to be poor as we might get stuck in local minima.

msokolov · 2024-10-18T18:49:05Z

The benchmark tools are hard-coded to use DOT_PRODUCT; see https://github.com/mikemccand/luceneutil/blob/main/src/main/perf/LineFileDocs.java#L454

Maybe this is why we get such poor results w/Cohere?

benwtrent · 2024-10-18T18:55:12Z

@msokolov using dot_product likely doesn't work with 768 cohere unless they are manually normalized. If these things aren't normalized, we will be getting some whacky scores and we likely lose a bunch of information by snapping to be greater than 0.

I could maybe see cosine working.

But I would suggest we switch to max-inner-product for Cohere 768 for a true test with those vectors as they were designed to be used.

msokolov · 2024-10-29T14:41:12Z

I ran a test comparing mip and angular over Cohere Wikipedia vectors (what KnnGraphTester calls MAXIMUM_INNER_PRODUCT and DOT_PRODUCT) and the results were surprising:

mainline, Cohere, angular

recall  latency (ms)     nDoc  topK  fanout  maxConn  beamWidth  quantized  index s  force merge s  num segments  index size (MB)
 0.631         0.496  1500000    10       6       32         50         no   330.94         213.55             1          4436.82
 0.617         0.439  1500000    10       6       32         50     7 bits   352.52         217.64             1          5543.35
 0.408         0.422  1500000    10       6       32         50     4 bits   340.32         151.22             1          5544.56

mainline, Cohere, mip

recall  latency (ms)     nDoc  topK  fanout  maxConn  beamWidth  quantized  index s  force merge s  num segments  index size (MB)
 0.593         0.475  1500000    10       6       32         50         no   325.19         210.78             1          4436.81
 0.601         0.454  1500000    10       6       32         50     7 bits   346.48         218.88             1          5543.35
 0.405         0.307  1500000    10       6       32         50     4 bits   345.31         144.83             1          5544.56

mikemccand added a commit that referenced this issue Apr 29, 2024

#256: generate float32 not float64 vectors for Cohere/wikipedia-22-12…

615ff6c

…-en-embeddings vectors

mikemccand mentioned this issue Aug 21, 2024

Try applying bipartite graph reordering to KNN graph node ids apache/lucene#13565

Open

mikemccand mentioned this issue Sep 5, 2024

New JMH benchmark method - vdot8s that implement int8 dotProduct in C… apache/lucene#13572

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch nightly benchy to more realistic `Cohere/wikipedia-22-12-en-embeddings` vectors #256

Switch nightly benchy to more realistic `Cohere/wikipedia-22-12-en-embeddings` vectors #256

mikemccand commented Mar 5, 2024

mikemccand commented Mar 10, 2024

mikemccand commented Mar 10, 2024

mikemccand commented Mar 11, 2024

mikemccand commented Mar 11, 2024

mikemccand commented Mar 11, 2024

mikemccand commented Mar 11, 2024 •

edited

Loading

mikemccand commented Mar 28, 2024

mikemccand commented Mar 28, 2024

mikemccand commented Mar 28, 2024

mikemccand commented Mar 28, 2024

mikemccand commented Apr 29, 2024

mikemccand commented Apr 29, 2024

msokolov commented Apr 29, 2024

msokolov commented Apr 30, 2024

mikemccand commented Jun 10, 2024

mikemccand commented Jun 10, 2024

mikemccand commented Jun 10, 2024

msokolov commented Jun 10, 2024 via email

benwtrent commented Oct 17, 2024

msokolov commented Oct 18, 2024

benwtrent commented Oct 18, 2024

msokolov commented Oct 29, 2024

Switch nightly benchy to more realistic Cohere/wikipedia-22-12-en-embeddings vectors #256

Switch nightly benchy to more realistic Cohere/wikipedia-22-12-en-embeddings vectors #256

Comments

mikemccand commented Mar 5, 2024

mikemccand commented Mar 10, 2024

mikemccand commented Mar 10, 2024

mikemccand commented Mar 11, 2024

mikemccand commented Mar 11, 2024

mikemccand commented Mar 11, 2024

mikemccand commented Mar 11, 2024 • edited Loading

mikemccand commented Mar 28, 2024

mikemccand commented Mar 28, 2024

mikemccand commented Mar 28, 2024

mikemccand commented Mar 28, 2024

mikemccand commented Apr 29, 2024

mikemccand commented Apr 29, 2024

msokolov commented Apr 29, 2024

msokolov commented Apr 30, 2024

mikemccand commented Jun 10, 2024

mikemccand commented Jun 10, 2024

mikemccand commented Jun 10, 2024

msokolov commented Jun 10, 2024 via email

benwtrent commented Oct 17, 2024

msokolov commented Oct 18, 2024

benwtrent commented Oct 18, 2024

msokolov commented Oct 29, 2024

mainline, Cohere, angular

mainline, Cohere, mip

Switch nightly benchy to more realistic `Cohere/wikipedia-22-12-en-embeddings` vectors #256

Switch nightly benchy to more realistic `Cohere/wikipedia-22-12-en-embeddings` vectors #256

mikemccand commented Mar 11, 2024 •

edited

Loading