-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scalar optimizations from CRoaring / arXiv:1709.07821 section 3 #127
Conversation
Got a benchmark harness with ops over the sample datasets. The results are consistent and very promising! |
That's quit good indeed 🚀 😄 However, I am not sure about the use of the new Maybe we can make sure that using inline asm doesn't break the build, I had too much trouble compiling croaring-rs (the wrapper of the C roaring library in Rust) due to a lot of raw SIMD function calls, unavailable on the platforms. Maybe we can use the std::simd soon to be released crate? |
Let's have this discussion in #60. This PR does not include any inline asm or explicit SIMD. The perf gains here are in making inserts and remove branchless, and by interleaving the cardinality tracking. |
Maintaining the len in the bitmap store has the additional micro-optimization of making |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is huge indeed but the work is great! It reduces the time it takes to do most of the operations and also reduces the amount of code in the bitmap/store.rs
file!
Thank you very much! I don't see any reason not to mark this PR as a candidate for merge!
@Kerollmops I'm checking invariants with debug assertion in: 62e2eed. I think it's sufficient. |
Co-authored-by: Clément Renault <[email protected]>
Co-authored-by: Clément Renault <[email protected]>
Re: Naming
We could also name them after what they're for. I don't have strong opinions on this. I'm just trying to think of what would be least surprising to me if I were seeing the codebase for the first time. |
Re: Naming
I like the idea, I would just like to keep the Re: Unchecked Deserialization
Ho! I understand now, you were trying to use the Also, the timings seem quite low, 2.5ms is very low, which dataset were you using to graph that? |
Each point is one of the datasets. The one with the most elements is weather_85 if my memory serves me correctly. In any case, I don't want to hold up this PR with this discussion, as it's out of scope. I will revert deserialize to the prior behavior. We can discuss elsewhere. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like what you've done! Indeed I would like you to remove the unchecked version of the deserialize method. Maybe you can create an issue for this specific issue?
OK, I'll open up a followup PR to add data validation to deserialize |
Do you think I can merge this PR while waiting for your follow-up PR on data validation? |
Yes. Follow up PR will depend on the try versions implemented in this one, so it's required to merge |
bors merge |
Build succeeded: |
127: Add scalar optimizations from CRoaring / arXiv:1709.07821 section 3 r=Kerollmops a=saik0 ### Purpose This PR adds some optimizations from CRoaring as outlined in arXiv:1709.07821 section 3 ### Overview * All inserts and removes are now branchless (!in arXiv:1709.0782, in CRoaring) * Section 3.1 was already implemented, except for `BitmapIter`. This is covered in RoaringBitmap#125 * Implement Array-Bitset aggregates as outlined in section 3.2 * Also branchless 😎 * Tracks bitmap cardinality while performing bitmap-bitmap ops * This is a deviation from CRoaring, and will need to be benchmarked further before this Draft PR is ready * Curious to hear what you think about this `@lemire` * In order to track bitmap cardinality the len field had to moved into `Store::Bitmap` * This is unfortunately a cross cutting change * `Store` was quite large (LoC) and had many responsibilities. The largest change in this draft is decomposing `Store` such hat it's field variants are two new container types: each responsible for maintaining their invariants and implementing `ops` * `Bitmap8K` keeps track of it's cardinality * `SortedU16Vec` maintains its sorting * `Store` now only delegates to these containers * My hope is that this will be useful when implementing run containers. 🤞 * Unfortunately so much code was moved this PR is _HUGE_ ### Out of scope * Inline ASM for Array-Bitset aggregates * Section 4 (explicit SIMD). As noted by the paper authors: The compiler does a decent job of autovectorization, though not as good as hand-tuned ### Notes * I attempted to emulate the inline ASM Array-Bitset aggregates by using a mix of unsafe ptr arithmetic and x86-64 intrinsics, hoping to compile to the same instructions. I was unable to get it under 13 instructions per iteration (compared to the papers 5). While it was an improvement, I abandoned the effort in favor of waiting for the `asm!` macro to stabilize. rust-lang/rust#72016 Co-authored-by: saik0 <[email protected]> Co-authored-by: Joel Pedraza <[email protected]>
Purpose
This PR adds some optimizations from CRoaring as outlined in arXiv:1709.07821 section 3
Overview
BitmapIter
. This is covered in Speed up bitmap iteration #125Store::Bitmap
Store
was quite large (LoC) and had many responsibilities. The largest change in this draft is decomposingStore
such hat it's field variants are two new container types: each responsible for maintaining their invariants and implementingops
Bitmap8K
keeps track of it's cardinalitySortedU16Vec
maintains its sortingStore
now only delegates to these containersOut of scope
Notes
asm!
macro to stabilize. Tracking Issue for inline assembly (asm!
) rust-lang/rust#72016