Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dilithium/ML-DSA Stack Optimizations #340

Merged
merged 32 commits into from
Apr 16, 2024

Conversation

dop-amin
Copy link
Contributor

@dop-amin dop-amin commented Apr 8, 2024

This PR adds stack optimizations for ML-DSA based on the ideas from the paper "Dilithium for Memory Constrained Devices" and code already written by @mkannwischer in a separate branch. It also superseeds the changes in #222.

I attach numbers on the stack utilization below.

Some remaining questions:


Dilithium2
keypair stack usage:
4408
sign stack usage:
5072
verify stack usage:
2704

Dilithium3
keypair stack usage:
4408
sign stack usage:
6608
verify stack usage:
2704

Dilithium5
keypair stack usage:
4408
sign stack usage:
8136
verify stack usage:
2712

@JunhaoHuang
Copy link
Contributor

JunhaoHuang commented Apr 9, 2024

Hi @dop-amin, since you have already included our NTT code in this PR, I think we could just close PR #338 and just keep this PR. That would make it easier to merge your work.

Best,
Junhao

@dop-amin
Copy link
Contributor Author

dop-amin commented Apr 9, 2024

Hi @dop-amin, since you have already included our NTT code in this PR, I think we could just close PR #338 and just keep this PR. That would make it easier to merge your work.

Best, Junhao

Hi Junhao, thanks for reaching out.
For the stack optimized version (m4fstack), I removed the asymmetric multiplication, so the code will be different from your PR which concerns the speed-variant.

I will try to make the files including your work into the stack-version be similar-looking to the ones from your PR, so it will not be confusing but we still can have separate versions. Then both PRs should be merge-able. What do you think?

@JunhaoHuang
Copy link
Contributor

Hi @dop-amin, since you have already included our NTT code in this PR, I think we could just close PR #338 and just keep this PR. That would make it easier to merge your work.
Best, Junhao

Hi Junhao, thanks for reaching out. For the stack optimized version (m4fstack), I removed the asymmetric multiplication, so the code will be different from your PR which concerns the speed-variant.

I will try to make the files including your work into the stack-version be similar-looking to the ones from your PR, so it will not be confusing but we still can have separate versions. Then both PRs should be merge-able. What do you think?

Hi @dop-amin , thnaks for your work! Yes, I think then these two PRs could be merged into m4f and m4fstack, respectively.

@mkannwischer
Copy link
Contributor

Thank you @dop-amin. That all looks very good to me.
I've just completed the benchmarks. Can you please have a look and let me know if everything looks the way you expect it?

@mkannwischer mkannwischer merged commit 149bfc7 into mupq:master Apr 16, 2024
6 checks passed
@dop-amin
Copy link
Contributor Author

Thank you @dop-amin. That all looks very good to me. I've just completed the benchmarks. Can you please have a look and let me know if everything looks the way you expect it?

The stack usage is identical to my measurements and the cycle counts also match my expectation.

@mkannwischer
Copy link
Contributor

Thanks!

@marco-palumbi
Copy link
Contributor

I see that all the files under dilithium3/m4fstack are actual files and not symlinks.
may be some of them should be symlink to files in dilithium2/m4f

@dop-amin
Copy link
Contributor Author

I see that all the files under dilithium3/m4fstack are actual files and not symlinks. may be some of them should be symlink to files in dilithium2/m4f

Hi, indeed this slipped through. The issue is addressed in #342.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants