Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GROMACS Crambin test fails at larger task counts #67

Open
casparvl opened this issue Jun 30, 2023 · 1 comment
Open

GROMACS Crambin test fails at larger task counts #67

casparvl opened this issue Jun 30, 2023 · 1 comment

Comments

@casparvl
Copy link
Collaborator

Command line:
  gmx_mpi mdrun -nb cpu -s benchmark.tpr -dlb yes -npme -1 -ntomp 1

Reading file benchmark.tpr, VERSION 5.1.4 (single precision)
Note: file tpx version 103, software tpx version 119
Changing nstlist from 10 to 80, rlist from 1.2 to 1.321


-------------------------------------------------------
Program:     gmx mdrun, version 2020.1-EasyBuild-4.5.0
Source file: src/gromacs/domdec/domdec.cpp (line 2277)
MPI rank:    0 (out of 2048)

Fatal error:
There is no domain decomposition for 1536 ranks that is compatible with the
given box and a minimum cell size of 0.6725 nm
Change the number of ranks or mdrun option -rdd or -dds
Look in the log file for details on the domain decomposition

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors

This Crambin test is the one tagged with the CI ReFrame tag.

One solution would be to make the tagging a bit more complicated, and use the Crambin test only for the singlenode case (and possibly two nodes). A more intricate solution could be to also implement a maximum task count for this test. I.e. let it auto-select a task count, and afterwards check if that is larger than 1536, then cap it. It does mean testing on more nodes is then useless, so if those tests are still generated it would lead to a wase of resources.

For now, I'd prefer just tagging a large test case for larger node counts.

@ocaisa
Copy link
Member

ocaisa commented Jun 30, 2023

I have noticed that some tests in the eessi demo also fail when task counts get high (even for as little as 32 cores). It would be good to have appropriate test cases for different sizes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants