salsa20: performance optimizations (e.g. SIMD) #50

tarcieri · 2019-08-19T20:03:05Z

There are two big optimizations we could do on both the chacha20 and salsa20 crates.

Avoid recomputing initial state

EDIT: both crates now have a new method to compute the initial state, and separate apply_keystream / generate methods to compute a block

chacha20 crate
salsa20 crate

RFC 8439 Section 3 describes caching the initial block state once computed as a performance optimization:

   Each block of ChaCha20 involves 16 move operations and one increment
   operation for loading the state, 80 each of XOR, addition and roll
   operations for the rounds, 16 more add operations and 16 XOR
   operations for protecting the plaintext.  Section 2.3 describes the
   ChaCha block function as "adding the original input words".  This
   implies that before starting the rounds on the ChaCha state, we copy
   it aside, only to add it in later.  This is correct, but we can save
   a few operations if we instead copy the state and do the work on the
   copy.  This way, for the next block you don't need to recreate the
   state, but only to increment the block counter.  This saves
   approximately 5.5% of the cycles.

SIMD support

Both ChaCha20 and Salsa20 are amenable to SIMD optimizations. We should add SIMD optimizations on x86/x86_64 at the very least.

`x86`/`x86_64`

chacha20
- SSE2 (chacha20: add SSE2 accelerated variant #61)
- AVX2 (chacha20: AVX2 backend #83, chacha20: Parallelize AVX2 backend #87)
salsa20
- SSE2
- AVX2

Other CPU architectures

ARM?

The text was updated successfully, but these errors were encountered:

tarcieri · 2020-01-17T17:52:11Z

Changed topic to salsa20 as chacha20 is now optimized on x86.

chacha20 could still use e.g. NEON acceleration.

tarcieri mentioned this issue Aug 21, 2019

ChaCha20Poly1305 AEAD RustCrypto/AEADs#3

Merged

tarcieri added enhancement help wanted labels Aug 21, 2019

tarcieri changed the title ~~chacha20/salsa20: performance optimizations (e.g. SIMD)~~ salsa20: performance optimizations (e.g. SIMD) Jan 17, 2020

This was referenced Jun 6, 2020

scrypt: use salsa20 crate? RustCrypto/password-hashes#29

Closed

Accelerate scrypt using SIMD RustCrypto/password-hashes#16

Closed

oxarbitrage mentioned this issue Sep 3, 2023

Salsa20 SSE2 version #328

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

salsa20: performance optimizations (e.g. SIMD) #50

salsa20: performance optimizations (e.g. SIMD) #50

tarcieri commented Aug 19, 2019 •

edited

Loading

tarcieri commented Jan 17, 2020

salsa20: performance optimizations (e.g. SIMD) #50

salsa20: performance optimizations (e.g. SIMD) #50

Comments

tarcieri commented Aug 19, 2019 • edited Loading

Avoid recomputing initial state

SIMD support

x86/x86_64

Other CPU architectures

tarcieri commented Jan 17, 2020

tarcieri commented Aug 19, 2019 •

edited

Loading

`x86`/`x86_64`