-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One-sided communications in MPICH are considerably slower than those in Aurora MPICH #7263
Comments
For reference, the relevant code in Fortran:
|
I have conducted performance test for one-sided communications between two GPU pointers on a single node by using a single-file reproducer (12 MPI ranks per node). The use of default Aurora MPICH in lustre_scaling queue makes the test completing in 2.5 seconds. The use of mpich 4.3.0rc2 in alcf_kmd_val queue leads the test completing in 2.9 seconds. This performance difference corresponds to 16% slowdown. The data file (data.txt) used by the single-file reproducer on a single node is attached to this message. |
The performance test of host-to-host one-sided communications on a single node shows 3x slowdown. The reproducer that can run either on a host or on a device is attached. |
Tests conducted on 144-node runs in the queue alcf_kmd_val show that one-sided communications in mpich 4.3.0rc2 are slower by 18% than those in the default Aurora MPICH. A single-file reproducer /tmp/reproducer7-alcf_kmd_val.tgz is available for download from aurora-uan-0010.
The text was updated successfully, but these errors were encountered: