Introduce Avx2 engine #1

AndersTrier · 2023-11-04T14:05:14Z

Hi!

First of all, thank you very much for writing this well structured library!

With inspiration from https://github.com/catid/leopard/ I wrote a new engine that leverages AVX2 instructions.
On an AMD Ryzen 5 3600, encoding is speed up by up to 1700%, and decoding up to about 1100%. Ex:

main/ReedSolomonEncoder/1000:1000
                        time:   [462.20 µs 462.81 µs 463.49 µs]
                        thrpt:  [4.1152 GiB/s 4.1212 GiB/s 4.1267 GiB/s]
                 change:
                        time:   [-94.288% -94.173% -94.088%] (p = 0.00 < 0.05)
                        thrpt:  [+1591.5% +1616.2% +1650.6%]
                        Performance has improved.

main/ReedSolomonDecoder/1000:1000 (1%)
                        time:   [1.5372 ms 1.5431 ms 1.5492 ms]
                        thrpt:  [1.2312 GiB/s 1.2361 GiB/s 1.2408 GiB/s]
                 change:
                        time:   [-91.384% -91.308% -91.237%] (p = 0.00 < 0.05)
                        thrpt:  [+1041.1% +1050.4% +1060.6%]
                        Performance has improved.

main/ReedSolomonDecoder/1000:1000 (100%)
                        time:   [1.5545 ms 1.5630 ms 1.5724 ms]
                        thrpt:  [1.2130 GiB/s 1.2203 GiB/s 1.2270 GiB/s]
                 change:
                        time:   [-91.580% -91.525% -91.467%] (p = 0.00 < 0.05)
                        thrpt:  [+1071.9% +1079.9% +1087.6%]
                        Performance has improved.

Full run: https://pastebin.com/m8AcP8T5

AndersTrier · 2023-11-04T14:51:16Z

Some relevant documentation: https://doc.rust-lang.org/core/arch/index.html

Right now, this PR completely breaks this library for x86 CPUs without AVX2. Here's my suggestion to how choosing this engine should behave:
By default, at runtime we should detect if the CPU supports AVX2 (this is what is_x86_feature_detected!("avx2") does), and fallback to using the NoSimd engine.

And since the Avx2 engine uses unsafe, I think there should be a way to turn it off.
This is currently implemented as a feature called avx2 which is enabled by default. I.e Cargo.toml contains:

[features]
default = ["avx2"]
avx2 = []

An alternative would be to have no_unsafe as an opt-in feature. Which one do you prefer?

[features]
no_unsafe = []

Assuming we go with no_unsafe, essentially what I would like is something like this in engine.rs:

#[cfg(all(not(feature = "no_unsafe"), any(target_arch = "x86", target_arch = "x86_64")))]
pub type DefaultEngine = if is_x86_feature_detected!("avx2") { Avx2 } else { NoSimd };

#[cfg(not(all(not(feature = "no_unsafe"), any(target_arch = "x86", target_arch = "x86_64"))))]
pub type DefaultEngine = NoSimd;

But that doesn't work for a type alias.

I tried with something like

pub enum DefaultEngine {}

impl DefaultEngine {
    pub fn new() -> impl Engine {
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
        {
            if is_x86_feature_detected!("avx2") {
                return Avx2::new();
            }
        }
        NoSimd::new()
    }
}

But that seems to require a modification of HighRate and LowRate and before I attempted that, I figured it would be good with some feedback.

AndersTrier · 2023-11-04T15:16:55Z

We could totally avoid unsafe by using https://docs.rs/safe_arch/latest/safe_arch/
but as far as I understand, that would require every user of the reed-solomon-16 library to compile their code with -C target-feature=+avx2.

Note: here's a footgun. By default, rustc only targets SSE but will happily compile the AVX2 intrinsics without you specifying -C target-feature=+avx2. The AVX2 intrinsics will then be "emulated" using SSE (also on a CPU with support for AVX2!). This makes Avx2 much slower than NoSimd.

That is why most of the methods are annotated with #[target_feature(enable = "avx2")] which tells the compiler to assume that AVX2 instructions are available, but that in turn requires the method to be marked as unsafe, even though it doesn't actually need unsafe privileges.

We could drop the privileges again by using another layer of abstraction. I.e:

impl mul() -> mul_unsafe_avx2enabled() -> mul_safe()

It adds quite a few "dummy" functions, but would move a lot of the code out of unsafe. Would you like to see an example of what that would look like?

malaire · 2023-11-04T15:58:02Z

Nice work! Unfortunately I've read that SIMD code of Leopard may have patent restrictions.

For example this comment at FastECC:

Moreover, FastECC is free from patent restrictions that has any fast RS codec employing PSHUFB (i.e. SSSE3) including Leopard.

I havn't been able to verify this (nor do I have enough knowledge to verify patent claims), but I've seen some evidence suggesting that this may be true. I can't take any risks concerning patents and so I don't think I can include any SIMD code for now.

ps. I don't have any experience on Rust SIMD so currently I also can't review code like this. I've been meaning to learn that, but because of the patent problem I havn't even started on this.

AndersTrier · 2023-11-06T09:13:33Z

Nice work!

Thanks!

I've read that SIMD code of Leopard may have patent restrictions. I havn't been able to verify this (nor do I have enough knowledge to verify patent claims), but I've seen some evidence suggesting that this may be true. I can't take any risks concerning patents and so I don't think I can include any SIMD code for now.

Think I found the patent you are referring to:
http://www.google.com/patents/US8683296

A break down of the patent can be found here: https://erasurecodepatents.wordpress.com/

It looks like StreamScale is having at least some success enforcing it: https://www.datanami.com/2023/10/16/cloudera-hit-with-240-million-judgement-over-erasure-coding/

But it sounds like Cloudera is going to appeal though:

We disagree with and are extremely disappointed by the decision. We remain steadfast in our belief in our defenses in the case, and we intend to challenge the judgment.

(Btw, the US District Court for the Western District of Texas is know for beeing very patent friendly: https://www.eff.org/deeplinks/2014/07/why-do-patent-trolls-go-texas-its-not-bbq )

Anyways, as that patent is only a risk for companies doing business in the US, how about making this code opt-in, and including a warning in the readme?

There are many other open source projects that include variations of this code:
@klauspost's Go library: https://github.com/klauspost/reedsolomon/
Red Hat Ceph: https://github.com/ceph/ceph/tree/main/src/erasure-code/jerasure
The Linux Kernel: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/raid6/avx2.c
Intel's Intelligent Storage Acceleration Library: https://github.com/intel/isa-l

malaire · 2023-11-11T08:16:19Z

So they have successfully enforced it against a company and got an open-source project (GF-Complete & Jerasure) to quit.

I can't take any risks with patent like that. Also my interest in this algorithm is longterm and I have no problem waiting until 2032 when that patent expires before adding SIMD.

p.s. It's always possible to implement AVX 2 engine in a separate crate, if you or someone else thinks this patent isn't a problem.

AndersTrier · 2023-11-12T13:52:50Z

That's totally fair. I'll fork and submit a separate crate

AndersTrier · 2023-11-25T17:41:32Z

For anyone interested, I've submitted a fork under the name reed-solomon-simd: https://crates.io/crates/reed-solomon-simd

On AArch64 it uses the Neon SIMD instructions, and on x86(-64) either AVX2 or SSSE3 is used. The best implementation is chosen at runtime with fallback to plain Rust.

Introduce Avx2 engine

720166e

AndersTrier mentioned this pull request Feb 12, 2024

Explain why this "Reed-Solomon' differs so much from storage ones AndersTrier/reed-solomon-simd#17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Avx2 engine #1

Introduce Avx2 engine #1

AndersTrier commented Nov 4, 2023

AndersTrier commented Nov 4, 2023 •

edited

Loading

AndersTrier commented Nov 4, 2023

malaire commented Nov 4, 2023

AndersTrier commented Nov 6, 2023

malaire commented Nov 11, 2023

AndersTrier commented Nov 12, 2023

AndersTrier commented Nov 25, 2023

Introduce Avx2 engine #1

Are you sure you want to change the base?

Introduce Avx2 engine #1

Conversation

AndersTrier commented Nov 4, 2023

AndersTrier commented Nov 4, 2023 • edited Loading

AndersTrier commented Nov 4, 2023

malaire commented Nov 4, 2023

AndersTrier commented Nov 6, 2023

malaire commented Nov 11, 2023

AndersTrier commented Nov 12, 2023

AndersTrier commented Nov 25, 2023

AndersTrier commented Nov 4, 2023 •

edited

Loading