Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sha2: wasm32 simd128 backends #562

Merged
merged 7 commits into from
Nov 3, 2024

Conversation

max-te
Copy link
Contributor

@max-te max-te commented Feb 10, 2024

This PR ports the AVX implementation of SHA-512 to simd128. It also implements the related version of SHA-256 from https://github.com/aws-samples/sha2-with-c-intrinsic/blob/master/src/sha256_compress_x86_64_avx.c in simd128.
Also added wasm32 testing in CI using wasmtime. Since wasm does not have feature detection, this backend is only used if the -C target-feature=+simd128 flag is set.

Benchmarks on AMD Ryzen 9 7950X3D, running with wasmtime 26.0.0 (c92317bcc 2024-10-22) on rustc 1.84.0-nightly (b3f75cc87 2024-11-02):

+ RUSTFLAGS='-C target-feature=+simd128'
+ cargo +nightly bench -q --bench mod --target wasm32-wasip1

running 8 tests
test sha256_10    ... bench:          18.71 ns/iter (+/- 1.62) = 555 MB/s
test sha256_100   ... bench:         167.94 ns/iter (+/- 0.62) = 598 MB/s
test sha256_1000  ... bench:       1,656.93 ns/iter (+/- 142.75) = 603 MB/s
test sha256_10000 ... bench:      15,601.30 ns/iter (+/- 1,268.65) = 640 MB/s
test sha512_10    ... bench:          14.35 ns/iter (+/- 0.09) = 714 MB/s
test sha512_100   ... bench:         137.37 ns/iter (+/- 0.87) = 729 MB/s
test sha512_1000  ... bench:       1,261.63 ns/iter (+/- 105.65) = 793 MB/s
test sha512_10000 ... bench:      12,434.24 ns/iter (+/- 24.46) = 804 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 4.40s

+ RUSTFLAGS='-C target-feature=-simd128'
+ cargo +nightly bench -q --bench mod --target wasm32-wasip1

running 8 tests
test sha256_10    ... bench:         155.59 ns/iter (+/- 1.08) = 64 MB/s
test sha256_100   ... bench:       1,539.48 ns/iter (+/- 9.18) = 64 MB/s
test sha256_1000  ... bench:      15,207.34 ns/iter (+/- 81.67) = 65 MB/s
test sha256_10000 ... bench:     151,547.98 ns/iter (+/- 1,170.30) = 65 MB/s
test sha512_10    ... bench:          98.59 ns/iter (+/- 0.45) = 102 MB/s
test sha512_100   ... bench:         980.99 ns/iter (+/- 3.43) = 102 MB/s
test sha512_1000  ... bench:       9,622.94 ns/iter (+/- 29.97) = 103 MB/s
test sha512_10000 ... bench:      95,977.25 ns/iter (+/- 310.30) = 104 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 6.55s

+ RUSTFLAGS='--cfg sha2_backend="soft" -C target-feature=+simd128'
+ cargo +nightly bench -q --bench mod --target wasm32-wasip1

running 8 tests
test sha256_10    ... bench:         142.07 ns/iter (+/- 13.71) = 70 MB/s
test sha256_100   ... bench:       1,404.58 ns/iter (+/- 10.83) = 71 MB/s
test sha256_1000  ... bench:      14,823.81 ns/iter (+/- 1,346.05) = 67 MB/s
test sha256_10000 ... bench:     139,001.94 ns/iter (+/- 978.58) = 71 MB/s
test sha512_10    ... bench:          90.39 ns/iter (+/- 7.82) = 111 MB/s
test sha512_100   ... bench:         893.20 ns/iter (+/- 72.22) = 111 MB/s
test sha512_1000  ... bench:       8,812.46 ns/iter (+/- 878.60) = 113 MB/s
test sha512_10000 ... bench:      87,887.02 ns/iter (+/- 394.70) = 113 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 8.62s

@max-te
Copy link
Contributor Author

max-te commented Feb 11, 2024

I also ended up porting the SHA-256 algorithm from https://github.com/aws-samples/sha2-with-c-intrinsic/blob/master/src/sha256_compress_x86_64_avx.c and updated this PR. Here are updated benchmarks with simd:

test sha256_10    ... bench:          22 ns/iter (+/- 0) = 454 MB/s
test sha256_100   ... bench:         215 ns/iter (+/- 2) = 465 MB/s
test sha256_1000  ... bench:       1,959 ns/iter (+/- 8) = 510 MB/s
test sha256_10000 ... bench:      19,401 ns/iter (+/- 22) = 515 MB/s
test sha512_10    ... bench:          17 ns/iter (+/- 0) = 588 MB/s
test sha512_100   ... bench:         164 ns/iter (+/- 0) = 609 MB/s
test sha512_1000  ... bench:       1,476 ns/iter (+/- 2) = 677 MB/s
test sha512_10000 ... bench:      14,513 ns/iter (+/- 18) = 689 MB/s

@max-te max-te changed the title sha2: wasm32 simd128 backend for SHA-512 sha2: wasm32 simd128 backends Feb 11, 2024
@CryZe
Copy link

CryZe commented Oct 27, 2024

What's the status of this?

@max-te max-te force-pushed the feat/wasm32-simd-sha512 branch 2 times, most recently from 3ee2583 to 6caccad Compare October 31, 2024 10:04
@max-te
Copy link
Contributor Author

max-te commented Oct 31, 2024

This is awaiting review.

@newpavlov Do you mind taking a look at this?

Copy link
Member

@newpavlov newpavlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review! The changes look mostly good

What are advantages of the explicit SIMD backend in the SHA256 case? It has the same performance as the soft backend. Maybe it's smaller in size when compiled?

sha2/src/sha512/wasm32.rs Outdated Show resolved Hide resolved
.github/workflows/sha2.yml Outdated Show resolved Hide resolved
@CryZe
Copy link

CryZe commented Nov 1, 2024

What are advantages of the explicit SIMD backend in the SHA256 case? It has the same performance as the soft backend.

Did you look at the second comment in this PR? It seems like initially there was no SIMD algorithm used for SHA-256 in the initial version of this PR but that changed the next day as indicated by the second comment.

Unless you of course did some more benchmarking and it's indeed not faster anymore.

@newpavlov
Copy link
Member

newpavlov commented Nov 1, 2024

Ah, I indeed missed the second comment. I think it's worth to update OP since its text will be included in the merge commit message.

@newpavlov
Copy link
Member

@max-te
I think we are good to merge. But could you measure performance of the software backend with enabled simd128 target feature on the same hardware? You can do it with this command:

RUSTFLAGS='--cfg sha2_backend="soft" -C target-feature=+simd128' cargo +nightly bench --target wasm32-wasi

I would like to add these results to the merge commit message.

@max-te
Copy link
Contributor Author

max-te commented Nov 3, 2024

Sure, I added that benchmark to the PR description and updated the other ones.

@newpavlov
Copy link
Member

Thank you!

@newpavlov newpavlov merged commit a68c77e into RustCrypto:master Nov 3, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants