Help out for batched random key generation? #705

stevefan1999-personal · 2024-09-23T14:34:27Z

I'm trying to generate vanity wallet addresses, which are originally in ed25519, so I start by generating random bytes from a PRNG enough to cast into a secret key and then generate key pairs, but I ran into performance issue so I want to debug it:

for _ in 0..cli.threads {
    let thr = {
        let counter = counter.clone();
        move || {
            let mut csprng = fastrand::Rng::new();
            let mut pair = SecretKey::default();

            loop {
                let key: SigningKey = SigningKey::from_bytes(&pair);
                counter.fetch_add(1, Ordering::Relaxed);
            }
        }
    };

    thread::spawn(thr);
}

On my 3700X PC, and running in full 16 threads, it only runs 300K keypairs/second with a precise monotonic clock. However, the C vanity address generator on my same PC, with same parameters can generate 30M keypairs/second with random data, so I think there could be something wrong here

So I started digging, which means we start from here:

curve25519-dalek/ed25519-dalek/src/signing.rs

Lines 102 to 108 in d5ef57a

    
           pub fn from_bytes(secret_key: &SecretKey) -> Self { 
        
               let verifying_key = VerifyingKey::from(&ExpandedSecretKey::from(secret_key)); 
        
               Self { 
        
                   secret_key: *secret_key, 
        
                   verifying_key, 
        
               } 
        
           }

And then here:

curve25519-dalek/ed25519-dalek/src/signing.rs

Lines 788 to 794 in d5ef57a

    
           impl From<&SecretKey> for ExpandedSecretKey { 
        
               #[allow(clippy::unwrap_used)] 
        
               fn from(secret_key: &SecretKey) -> ExpandedSecretKey { 
        
                   let hash = Sha512::default().chain_update(secret_key).finalize(); 
        
                   ExpandedSecretKey::from_bytes(hash.as_ref()) 
        
               } 
        
           }

And then here:

curve25519-dalek/ed25519-dalek/src/hazmat.rs

Lines 57 to 76 in d5ef57a

    
           impl ExpandedSecretKey { 
        
               /// Construct an `ExpandedSecretKey` from an array of 64 bytes. In the spec, the bytes are the 
        
               /// output of a SHA-512 hash. This clamps the first 32 bytes and uses it as a scalar, and uses 
        
               /// the second 32 bytes as a domain separator for hashing. 
        
               pub fn from_bytes(bytes: &[u8; 64]) -> Self { 
        
                   // TODO: Use bytes.split_array_ref once it’s in MSRV. 
        
                   let mut scalar_bytes: [u8; 32] = [0u8; 32]; 
        
                   let mut hash_prefix: [u8; 32] = [0u8; 32]; 
        
                   scalar_bytes.copy_from_slice(&bytes[00..32]); 
        
                   hash_prefix.copy_from_slice(&bytes[32..64]); 
        
                   // For signing, we'll need the integer, clamped, and converted to a Scalar. See 
        
                   // PureEdDSA.keygen in RFC 8032 Appendix A. 
        
                   let scalar = Scalar::from_bytes_mod_order(clamp_integer(scalar_bytes)); 
        
                   ExpandedSecretKey { 
        
                       scalar, 
        
                       hash_prefix, 
        
                   } 
        
               }

And then here:

curve25519-dalek/curve25519-dalek/src/scalar.rs

Lines 237 to 246 in d5ef57a

    
           pub fn from_bytes_mod_order(bytes: [u8; 32]) -> Scalar { 
        
               // Temporarily allow s_unreduced.bytes > 2^255 ... 
        
               let s_unreduced = Scalar { bytes }; 
        
               // Then reduce mod the group order and return the reduced representative. 
        
               let s = s_unreduced.reduce(); 
        
               debug_assert_eq!(0u8, s[31] >> 7); 
        
               s 
        
           }

And eventually here:

curve25519-dalek/curve25519-dalek/src/scalar.rs

Lines 1127 to 1128 in d5ef57a

    
           let xR = UnpackedScalar::mul_internal(&x, &constants::R); 
        
           let x_mod_l = UnpackedScalar::montgomery_reduce(&xR);

I noticed that despite I'm using AVX512, the flame graph shows that those UnpackedScalar::mul_internal and UnpackedScalar::montgomery_reduce are still using generic serial implementation and took a large group of execution time, I'm not sure if this can be sped-up and improved? Or maybe I should look for something else for address generation?

I noticed that the wallet vanity address generator in C uses SUPERCOP's version, and their ed25519 key generation is much faster as well.

The text was updated successfully, but these errors were encountered:

stevefan1999-personal · 2024-09-23T15:40:28Z

I made a minimally viable benchmark program here in https://github.com/stevefan1999-personal/ed25519-gen-speed-test. I used the fastest random number generator possible to get rid of its effect.

On my 3700X PC, this generates:

7 keys per second on average
268020 keys per second on average
284770 keys per second on average
264184 keys per second on average
252750 keys per second on average
245968 keys per second on average
241423 keys per second on average
237701 keys per second on average
234541 keys per second on average
232072 keys per second on average
229985 keys per second on average
228280 keys per second on average

stevefan1999-personal · 2024-09-23T16:00:45Z

I reran the key generation benchmark with "fast" featured enabled back on for ed25519-dalek, and performance improved 3 times, and this is the hot path tree diagram that I got

tarcieri · 2024-09-23T16:27:59Z

Take a look at the MultiscalarMul trait: https://docs.rs/curve25519-dalek/4.1.3/curve25519_dalek/edwards/struct.EdwardsPoint.html#impl-MultiscalarMul-for-EdwardsPoint

stevefan1999-personal · 2024-09-23T16:31:07Z

Take a look at the MultiscalarMul trait: https://docs.rs/curve25519-dalek/4.1.3/curve25519_dalek/edwards/struct.EdwardsPoint.html#impl-MultiscalarMul-for-EdwardsPoint

May I get some papers or lecture notes on how to use that?

tarcieri · 2024-09-23T16:32:35Z

It takes an iterator over Scalars and an iterator over EdwardsPoints and batch multiplies the scalars by the points, so you can generate a batch of candidate scalars and multiply several of them by the base point at once

stevefan1999-personal · 2024-09-23T17:03:35Z

It takes an iterator over Scalars and an iterator over EdwardsPoints and batch multiplies the scalars by the points, so you can generate a batch of candidate scalars and multiply several of them by the base point at once

If I read that correctly, I can use multiscalar_mul to do something that means scalar * constants::ED25519_BASEPOINT_TABLE but similar to a dot product operation instead, right?

stevefan1999-personal · 2024-09-23T17:27:48Z

Okay, so I came up with this:

    let mut rng = Rng::new();

    let scalars: Vec<ExpandedSecretKey> = (0..16)
        .map(|_| {
            let mut secret_key: SecretKey = SecretKey::default();
            rng.fill(&mut secret_key);
            ExpandedSecretKey::from(&secret_key)
        })
        .collect();

    EdwardsPoint::multiscalar_mul(
        scalars.iter().map(|x| x.scalar),
        scalars.iter().map(|_| constants::ED25519_BASEPOINT_POINT),
    );

However, it returns an Edwards point, so how do I separate the multiplied (scalar * ed25519 base point) to their own respective points?

tarcieri · 2024-09-23T17:35:02Z

Aah sorry, that's the wrong API shape for your problem

stevefan1999-personal · 2024-09-23T17:47:57Z

Aah sorry, that's the wrong API shape for your problem

Ahh I think it is close. I just need to generate N scalars from entropy, and get the [for some scalar in scalars (scalar * ed25519 base point)] without having it dot product summed, but do so in an optimized, data-parallel way. This way I don't have to blow up the CPU cache while doing SIMD in lockstep.

Also, I tried to simplify the public key generation, is this correct?

                    let mut secret_key = SecretKey::default();
                    csprng.fill_bytes(&mut secret_key);
                    let pub_key =
                        EdwardsPoint::mul_base(&Scalar::from_bytes_mod_order(clamp_integer(
                            Sha512::default().chain_update(secret_key).finalize()[..32]
                                .try_into()
                                .unwrap(),
                        )))
                        .compress()
                        .to_bytes();

stevefan1999-personal mentioned this issue Sep 24, 2024

Document Cofactor multiplied base points? #706

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help out for batched random key generation? #705

Help out for batched random key generation? #705

stevefan1999-personal commented Sep 23, 2024 •

edited

Loading

stevefan1999-personal commented Sep 23, 2024

stevefan1999-personal commented Sep 23, 2024 •

edited

Loading

tarcieri commented Sep 23, 2024

stevefan1999-personal commented Sep 23, 2024

tarcieri commented Sep 23, 2024

stevefan1999-personal commented Sep 23, 2024

stevefan1999-personal commented Sep 23, 2024

tarcieri commented Sep 23, 2024

stevefan1999-personal commented Sep 23, 2024 •

edited

Loading

Help out for batched random key generation? #705

Help out for batched random key generation? #705

Comments

stevefan1999-personal commented Sep 23, 2024 • edited Loading

stevefan1999-personal commented Sep 23, 2024

stevefan1999-personal commented Sep 23, 2024 • edited Loading

tarcieri commented Sep 23, 2024

stevefan1999-personal commented Sep 23, 2024

tarcieri commented Sep 23, 2024

stevefan1999-personal commented Sep 23, 2024

stevefan1999-personal commented Sep 23, 2024

tarcieri commented Sep 23, 2024

stevefan1999-personal commented Sep 23, 2024 • edited Loading

stevefan1999-personal commented Sep 23, 2024 •

edited

Loading

stevefan1999-personal commented Sep 23, 2024 •

edited

Loading

stevefan1999-personal commented Sep 23, 2024 •

edited

Loading