-
-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Consider something like pow2k
#23
Comments
I tried out implementing it as a simple loop of the Square body, manually inlining |
Point compression/decompression involve one Invert, and 1 SqrtRatio respectively. While it's true that those things aren't repeatedly called, a lot of the actual uses do involve doing both (eg: ref10-style Ed25519 signature verification). |
As another concrete example of where this would be nice would be Benchmarking my (unpublished) edwards25519 based version, Invert + Pow22523 account for ~23.5% of the function's runtime (I can reduce this by shaving off 2 decompressions by checking point canonicity on the encoded buffer, but it still will be a decent fraction). |
Both
Invert
andPow22523
repeatedly square in a loop. The overhead of repeatedly calling Square (and having to shuffling data in/out of registers) adds up to a decent chunk of execution time.Doing something like
func (v *Element) pow2k(x *Element, k uint)
with the precondition thatk >= 1
, dramatically improves performance of the two operations like thus:Numbers taken with
purego
, but the amd64 assembly implementation will also benefit (and can be written without having to spill to the stack at all).The text was updated successfully, but these errors were encountered: