Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Avx2 engine #1

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AndersTrier
Copy link

Hi!

First of all, thank you very much for writing this well structured library!

With inspiration from https://github.com/catid/leopard/ I wrote a new engine that leverages AVX2 instructions.
On an AMD Ryzen 5 3600, encoding is speed up by up to 1700%, and decoding up to about 1100%. Ex:

main/ReedSolomonEncoder/1000:1000
                        time:   [462.20 µs 462.81 µs 463.49 µs]
                        thrpt:  [4.1152 GiB/s 4.1212 GiB/s 4.1267 GiB/s]
                 change:
                        time:   [-94.288% -94.173% -94.088%] (p = 0.00 < 0.05)
                        thrpt:  [+1591.5% +1616.2% +1650.6%]
                        Performance has improved.

main/ReedSolomonDecoder/1000:1000 (1%)
                        time:   [1.5372 ms 1.5431 ms 1.5492 ms]
                        thrpt:  [1.2312 GiB/s 1.2361 GiB/s 1.2408 GiB/s]
                 change:
                        time:   [-91.384% -91.308% -91.237%] (p = 0.00 < 0.05)
                        thrpt:  [+1041.1% +1050.4% +1060.6%]
                        Performance has improved.

main/ReedSolomonDecoder/1000:1000 (100%)
                        time:   [1.5545 ms 1.5630 ms 1.5724 ms]
                        thrpt:  [1.2130 GiB/s 1.2203 GiB/s 1.2270 GiB/s]
                 change:
                        time:   [-91.580% -91.525% -91.467%] (p = 0.00 < 0.05)
                        thrpt:  [+1071.9% +1079.9% +1087.6%]
                        Performance has improved.

Full run: https://pastebin.com/m8AcP8T5

@AndersTrier
Copy link
Author

AndersTrier commented Nov 4, 2023

Some relevant documentation: https://doc.rust-lang.org/core/arch/index.html

Right now, this PR completely breaks this library for x86 CPUs without AVX2. Here's my suggestion to how choosing this engine should behave:
By default, at runtime we should detect if the CPU supports AVX2 (this is what is_x86_feature_detected!("avx2") does), and fallback to using the NoSimd engine.

And since the Avx2 engine uses unsafe, I think there should be a way to turn it off.
This is currently implemented as a feature called avx2 which is enabled by default. I.e Cargo.toml contains:

[features]
default = ["avx2"]
avx2 = []

An alternative would be to have no_unsafe as an opt-in feature. Which one do you prefer?

[features]
no_unsafe = []

Assuming we go with no_unsafe, essentially what I would like is something like this in engine.rs:

#[cfg(all(not(feature = "no_unsafe"), any(target_arch = "x86", target_arch = "x86_64")))]
pub type DefaultEngine = if is_x86_feature_detected!("avx2") { Avx2 } else { NoSimd };

#[cfg(not(all(not(feature = "no_unsafe"), any(target_arch = "x86", target_arch = "x86_64"))))]
pub type DefaultEngine = NoSimd;

But that doesn't work for a type alias.

I tried with something like

pub enum DefaultEngine {}

impl DefaultEngine {
    pub fn new() -> impl Engine {
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
        {
            if is_x86_feature_detected!("avx2") {
                return Avx2::new();
            }
        }
        NoSimd::new()
    }
}

But that seems to require a modification of HighRate and LowRate and before I attempted that, I figured it would be good with some feedback.

@AndersTrier
Copy link
Author

We could totally avoid unsafe by using https://docs.rs/safe_arch/latest/safe_arch/
but as far as I understand, that would require every user of the reed-solomon-16 library to compile their code with -C target-feature=+avx2.

Note: here's a footgun. By default, rustc only targets SSE but will happily compile the AVX2 intrinsics without you specifying -C target-feature=+avx2. The AVX2 intrinsics will then be "emulated" using SSE (also on a CPU with support for AVX2!). This makes Avx2 much slower than NoSimd.

That is why most of the methods are annotated with #[target_feature(enable = "avx2")] which tells the compiler to assume that AVX2 instructions are available, but that in turn requires the method to be marked as unsafe, even though it doesn't actually need unsafe privileges.

We could drop the privileges again by using another layer of abstraction. I.e:

impl mul() -> mul_unsafe_avx2enabled() -> mul_safe()

It adds quite a few "dummy" functions, but would move a lot of the code out of unsafe. Would you like to see an example of what that would look like?

@malaire
Copy link
Owner

malaire commented Nov 4, 2023

Nice work! Unfortunately I've read that SIMD code of Leopard may have patent restrictions.

For example this comment at FastECC:

Moreover, FastECC is free from patent restrictions that has any fast RS codec employing PSHUFB (i.e. SSSE3) including Leopard.

I havn't been able to verify this (nor do I have enough knowledge to verify patent claims), but I've seen some evidence suggesting that this may be true. I can't take any risks concerning patents and so I don't think I can include any SIMD code for now.

ps. I don't have any experience on Rust SIMD so currently I also can't review code like this. I've been meaning to learn that, but because of the patent problem I havn't even started on this.

@AndersTrier
Copy link
Author

Nice work!

Thanks!

I've read that SIMD code of Leopard may have patent restrictions. I havn't been able to verify this (nor do I have enough knowledge to verify patent claims), but I've seen some evidence suggesting that this may be true. I can't take any risks concerning patents and so I don't think I can include any SIMD code for now.

Think I found the patent you are referring to:
http://www.google.com/patents/US8683296

A break down of the patent can be found here: https://erasurecodepatents.wordpress.com/

It looks like StreamScale is having at least some success enforcing it: https://www.datanami.com/2023/10/16/cloudera-hit-with-240-million-judgement-over-erasure-coding/

But it sounds like Cloudera is going to appeal though:

We disagree with and are extremely disappointed by the decision. We remain steadfast in our belief in our defenses in the case, and we intend to challenge the judgment.

(Btw, the US District Court for the Western District of Texas is know for beeing very patent friendly: https://www.eff.org/deeplinks/2014/07/why-do-patent-trolls-go-texas-its-not-bbq )

Anyways, as that patent is only a risk for companies doing business in the US, how about making this code opt-in, and including a warning in the readme?

There are many other open source projects that include variations of this code:
@klauspost's Go library: https://github.com/klauspost/reedsolomon/
Red Hat Ceph: https://github.com/ceph/ceph/tree/main/src/erasure-code/jerasure
The Linux Kernel: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/raid6/avx2.c
Intel's Intelligent Storage Acceleration Library: https://github.com/intel/isa-l

@malaire
Copy link
Owner

malaire commented Nov 11, 2023

So they have successfully enforced it against a company and got an open-source project (GF-Complete & Jerasure) to quit.

I can't take any risks with patent like that. Also my interest in this algorithm is longterm and I have no problem waiting until 2032 when that patent expires before adding SIMD.

p.s. It's always possible to implement AVX 2 engine in a separate crate, if you or someone else thinks this patent isn't a problem.

@AndersTrier
Copy link
Author

That's totally fair. I'll fork and submit a separate crate

@AndersTrier
Copy link
Author

For anyone interested, I've submitted a fork under the name reed-solomon-simd: https://crates.io/crates/reed-solomon-simd

On AArch64 it uses the Neon SIMD instructions, and on x86(-64) either AVX2 or SSSE3 is used. The best implementation is chosen at runtime with fallback to plain Rust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants