-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specifying mpi executable/binary doesn't carry over into a batch script? #287
Comments
Note that somehow for @JRChreim, he issues and gets for binpath in '/ocean/projects/phy230019p/jrchreim/MFCMerge-DeleteAfterMerging/MFC-GPU/build/instal$
echo -e ":) Running $binpath:"
mpirun \
-np 16 \
"$binpath"
done
code=$? which is notably different than my case.. |
When I use batch mode on Phoenix, which also requires MPIRun, I always have to modify the batch script. There's an if statement in the default batch script, but it doesn't seem to work on Phoenix. Also relevant is #240 because I have to manually change the #SBATCH stuff at the top for it to work on Phoenix. |
Good to know. The |
Relevant function: MFC/toolchain/mfc/run/engines.py Line 304 in 1899a6c
|
@sbryngelson I too had to manually change the slurm submission file/change it inside MFC code. It was failing before, and the -b mpirun seems to work only for interactive sessions. |
@sbryngelson This is related to #240. The current expectation is that one modifies the template/ file so that it works on their system. It is somewhat unreasonable to assume this should just work everywhere given how eccentric many systems are. I agree that it isn't obvious that henryleberre:~/dev/MFC $ ./mfc.sh run -h | grep binary
-b {jsrun,srun,mpirun,mpiexec,N/A}, --binary {jsrun,srun,mpirun,mpiexec,N/A}
(Interactive) Override MPI execution binary (default: The reason for the |
Yeah I've come to understand what the current state is better now. I think #240 is a good idea, especially because it makes it obvious to the user that any template exists. Right now, no one seems to really know this, and it would obviously become annoying to re-create the same one every time you clone or fork MFC. We presumably could just have a small set of such scripts sitting in a directory that people can add to (just like |
I agree this is probably the best path forward. I have had to make quite a few changes for benchmarking so once that is done it should be rather straightforward to get this working. I image something like: $ ./mfc.sh run case.py -e batch -c bridges2
.. template is toolchain/templates/bridges2.sh
$ ./mfc.sh run case.py -e batch -c ~/my_system.sh
.. template is ~/my_system.sh |
Sounds good to me. I will leave this open as a tracking item for this improvement. |
Running the following:
[I]br012: bryngel-startup-proj/MFC $ ./mfc.sh run examples/1D_sodshocktube/case.py -t pre_process simulation -e batch -N 1 -n 1 -w "00:10:00" -p RM-small -# "shb-test" -b mpirun -a phy210041p
on Bridges2 gives in stdout:
This seems "fine." But when one checks the generated slurm submission file we find (excerpt)
which ends up calling
srun
(since it is available), even though we specifiedmpirun
via-b mpirun
. This job fails. If I delete thesrun
lines from the generated slurm submission fileshb-test.sh
and keep thempirun
lines, it completes fine.I do notice that the
--help
gives:Which seems to suggest that
-b
only does something in interactive mode (maybe?). It certainly seems true.If the above is true, this seems like something we need to change (not saying who, just noting it here so we are aware at least).
@henryleberre can you confirm the above to "make sense"?
Tagging the relevant folks using Bridges2: @anshgupta1234 @wilfonba @JRChreim
The text was updated successfully, but these errors were encountered: