-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can we remove compress
option for quantized KNN vector indexing?
#13768
Comments
I'll see if I can benchmark at least this tradeoff using luceneutil's |
I agree, this is worth digging into. In my benchmarking, It would be good to benchmark this with and without panamavector enabled over intel & arm. Note, there has been work to move all quantized comparisons off-heap: #13497 the results of which may or may not effect this decision (bytes no longer being copied on heap, thus the fewer bytes being copied no longer harm/help performance). But, I kept having weird slow downs that I cannot figure out and nobody else can replicate (@ChrisHegarty has tried and couldn't see why I keep seeing significant slow downs given different JDK versions). |
Well, I ran
This is with Panama enabled ( Results:
Unfortunately the output doesn't state it, but the first row is |
OK I disabled Panama (via temporary code change in
Indeed there is some performance penalty now (285 usec -> 312 usec, ~9.5%) ... recall also bounced around a bit, but prolly that's acceptable HNSW randomness noise. And wow look how much slower indexing / force merging got ... those SIMD instructions clearly help ;) But I don't think we should block removing |
I agree. At this point we're just comparing the scalar and SIMD implementation of the distance functions. For vector operations, we really need SIMD, and I think we're ok with this approach. I'm +1 to remove compress, if there is no other reason to keep it. |
If we are ok with the perf hit on non-panama, I am cool with it :). It will definitely simplify the code. |
Actually, thinking about this more ... I'm changing my mind. I don't fully understand how poor our Panama/SIMD coverage is across CPU types/versions, "typically" in use by our users. E.g. for ARM CPUs (various versions of NEON instructions). What %tg of our users would hit the non-SIMD (non-Panama) path? It's spooky that the likes of OpenSearch, Elasticsearch, Solr are needing to pull in their own Panama FMA wrappers around native code to better optimize for certain vectorized instruction cases (see discussion on #13572). Ideally such optimizations would be in Lucene so we could make decisions like this (remove I'd like to run benchmarks across many more CPUs before rushing to a decision here, and I think for now we should otherwise respect the non-SIMD results? I love our new |
Description
Spinoff from this comment.
This (
compress=true
) is a useful option when quantizing KNN vectors to 4 bits: it packs pairs of dimensions into a single byte, so the "hot working set" of your KNN/HNSW vectors at search time is half the already reduced (fromfloat32
->byte
) size. Whencompress
isfalse
then it's wasteful, using only four bits for every byte.But it comes with some penalty to decode the "packed" (
compress=true
) form during KNN search, which is why we give this choice to the user.But then I think there was at least one opto to that path, so maybe the performance penalty isn't so bad now? In which case maybe we can just always hardwire
compress=true
when quantizedbits=4
?(
compress=true
doesn't apply to 7 bit quantization)The text was updated successfully, but these errors were encountered: