-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster BoolReader #124
Faster BoolReader #124
Conversation
That said, lossless WebP is already plenty fast specifically due to optimizations to transforms. We actually beat |
Regarding bit reading: libwebp has a dedicated codepath for reading with probability 128 that is distinct from the general-purpose one. Is that something that you've explored? If you haven't attempted it, it doesn't have to be a part of this PR. I just wanted to know if this has been attempted or not. I would expect this not to matter if the hot variant of |
Huh are you sure? I only mentioned it because Not denying that it's already plenty fast, just that I'm certain it showed up in my call graphs inside |
Yes that's the But indeed, that's something that can be revisited in a separate PR. |
It might be worth renaming "bool reader" to "arithmetic decoder" or something to that effect, because it is doing boolean arithmetic coding rather than simply reading bits. |
FWIW there is no change on end-to-end benchmarks for the large image on my machine from the FastReader::read_flag optimization. It's possible that it helps other machines, just not mine. |
I can confirm this didn't break anything 🎉 No behavioral changes before and after on my corpus of 7,500 images scraped from the web. |
I've made clippy happy. Please rebase. |
And I'd like to get this merged before any further merge conflicts arise. I don't think we can ship 1.80 MSRV just yet. It is very recent, and I see two viable options:
Thoughts? |
I vote for using |
Honestly, maybe we should just bump the MSRV to 1.80. We've got other changes that have been waiting on that same version bump for a while, and it has been nearly 6 months since the 1.80 release.
|
In that case that only needs a rebase against latest main, and it's good to go! @SLiV9 can you handle that? I'll push the merge button immediately after so that it doesn't diverge again. |
I just rebased. Let me know if you still want it to be bytemuck and I'll do it later today.
I did take a look at inlining, but |
@SLiV9 thanks again for the PR! These are really impressive performance gains, and I don't think we would've been able to optimize this part ourselves. |
Happy to help! Thanks for the challenge and the awesome library. |
With this optimization (and other optimizations that went into the 0.2.1 release), https://github.com/Shnatsel/wondermagick backed by image-webp is now faster than imagemagick at decoding and thumbnailing a WebP image! |
BoolReader
to its own file.read_bool
andread_with_tree
by assuming none of them reach the end of the buffer and returning a transparentBitResult
, then validating at the end.read_bool
andread_with_tree
by assuming each bit can be read from the 4-byte chunks (inFastReader
), and retrying with the slow approach if this fails.Final performance results are a 1.3x speedup compared to image-rs 0.2.0 (
--use-reference
), although it is still 1.3x slower than libwebp:(I ran
dwebp
as the first and the last candidate to negate any effects from my poor laptop's CPU overheating.)This uses as_flattened_mut() which was stabilized in 1.80.0, so merging this probably requires raising the MSRV. I don't know your policy on that, but the alternative was adding unsafe or adding another dependency (that itself uses unsafe), so I left it as is.
PS:
read_literal
has some obvious optimizations but it doesn't seem part of the latency critical path.read_flag
's1 + ((range - 1) * 128) >> 8)
but it seems hard to measure.