Improve Rolling Hashes #103
Labels
core
Work on the algorithm design and implementation
help wanted
Extra attention is needed
huge
Large task potentially involving architectural or breaking changes
performance
Performance related discussion or suggestion
In StringZilla a 64-bit rolling hash function is reused for both string hashes and substring hashes, Rabin-style fingerprints. Rolling hashes take the same amount of time to compute hashes with different window sizes, and are fast to update. Those are not however perfect hashes, and collisions are frequent. StringZilla attempts to use SIMD, but the performance is not yet satisfactory. On Intel Sapphire Rapids, the following numbers can be expected for N-way parallel variants.
That's extremely slow. Using SIMD and a better scheme, we should be approaching
memcpy
speeds. Trying alternative rolling hash approaches is the biggest goal for the library in the upcoming releases, so algorithm recommendations would be appreciated!The text was updated successfully, but these errors were encountered: