Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Consider something like pow2k #23

Open
Yawning opened this issue Aug 8, 2021 · 3 comments
Open

perf: Consider something like pow2k #23

Yawning opened this issue Aug 8, 2021 · 3 comments

Comments

@Yawning
Copy link

Yawning commented Aug 8, 2021

Both Invert and Pow22523 repeatedly square in a loop. The overhead of repeatedly calling Square (and having to shuffling data in/out of registers) adds up to a decent chunk of execution time.

Doing something like func (v *Element) pow2k(x *Element, k uint) with the precondition that k >= 1, dramatically improves performance of the two operations like thus:

Invert-4    11.3µs ± 0%   7.1µs ± 0%  -37.33%
Pow22523-4  11.1µs ± 0%   7.0µs ± 0%  -37.32%

Numbers taken with purego, but the amd64 assembly implementation will also benefit (and can be written without having to spill to the stack at all).

@FiloSottile
Copy link
Owner

I tried out implementing it as a simple loop of the Square body, manually inlining carryPropagate, and I don't see any improvement on high-level group functions. Are we missing some useful benchmark, or is this for an application that relies specifically on Invert and Pow22523?

@Yawning
Copy link
Author

Yawning commented Aug 9, 2021

I tried out implementing it as a simple loop of the Square body, manually inlining carryPropagate, and I don't see any improvement on high-level group functions. Are we missing some useful benchmark, or is this for an application that relies specifically on Invert and Pow22523?

Point compression/decompression involve one Invert, and 1 SqrtRatio respectively. While it's true that those things aren't repeatedly called, a lot of the actual uses do involve doing both (eg: ref10-style Ed25519 signature verification).

@Yawning
Copy link
Author

Yawning commented Aug 19, 2021

As another concrete example of where this would be nice would be ECVRF_prove from the VRF draft.

Benchmarking my (unpublished) edwards25519 based version, Invert + Pow22523 account for ~23.5% of the function's runtime (I can reduce this by shaving off 2 decompressions by checking point canonicity on the encoded buffer, but it still will be a decent fraction).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants