-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Gather and maintain a list of "batch file" templates for various computers #240
Comments
I agree that something useful can and should be done here. 2. and 3. do sound hard to maintain and more complicated than necessary. 1. seems best. With 1., it would be useful to keep templates there, but also build in a way to automatically grab those templates via |
The documentation for Bridges2 with > 8 GPUs per job contradicts what works on their system. The relevant distinction is the need to use So, I'll post a working script here for a future template (> 1 GPU node, in this case, 2 nodes of 8 GPUs): #!/usr/bin/env bash
#SBATCH --job-name="shb-test"
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=1
#SBATCH --gpu-bind=verbose,closest
#SBATCH --gres=gpu:v100-16:8
#SBATCH --time=00:10:00
#SBATCH --partition="GPU"
#SBATCH --output="shb-test.out"
#SBATCH --account="phy210041p"
#SBATCH --error="shb-test.err"
#SBATCH --export=ALL
#SBATCH --mail-type="BEGIN, END, FAIL"
TABLE_FORMAT_LINE="| - %-14s %-35s - %-14s %-35s |\n"
TABLE_HEADER="+-----------------------------------------------------------------------------------------------------------+ \n"
TABLE_FOOTER="+-----------------------------------------------------------------------------------------------------------+ \n"
TABLE_TITLE_FORMAT="| %8s %-96s |\n"
TABLE_CONTENT=$(cat <<-END
$(printf "$TABLE_FORMAT_LINE" "Start-time:" "$(date +%T)" "Start-date:" "$(date +%T)")
$(printf "$TABLE_FORMAT_LINE" "Partition:" "GPU" "Walltime:" "00:10:00")
$(printf "$TABLE_FORMAT_LINE" "Account:" "phy210041p" "Nodes:" "1")
$(printf "$TABLE_FORMAT_LINE" "Job Name:" "shb-test" "Engine" "batch")
$(printf "$TABLE_FORMAT_LINE" "Queue System:" "SLURM" "Email:" "")
END
)
printf "$TABLE_HEADER"
printf "$TABLE_TITLE_FORMAT" "Starting" "shb-test from /ocean/projects/phy210041p/bryngel/MFC/examples/1D_sodshocktube/case.py:"
printf "$TABLE_CONTENT\n"
printf "$TABLE_FOOTER\n"
printf ":) Loading modules...\n"
module purge
module load openmpi/4.0.5-nvhpc22.9 nvhpc/22.9 cuda/11.7.1 python/3.8.6
cd "/ocean/projects/phy210041p/bryngel/MFC/examples/1D_sodshocktube"
t_start=$(date +%s)
for binpath in '/ocean/projects/phy210041p/bryngel/MFC/build/install/no-debug_gpu_mpi/bin/syscheck' '/ocean/projects/phy210041p/bryngel/MFC/build/install/no-debug_gpu_mpi/bin/pre_process' '/ocean/projects/phy210041p/bryngel/MFC/build/install/no-debug_gpu_mpi/bin/simulation'; do
echo -e ":) Running $binpath:"
mpirun -np 16 "$binpath"
done
code=$?
t_stop="$(date +%s)"
printf "\n$TABLE_HEADER"
printf "$TABLE_TITLE_FORMAT" "Finished" "shb-test:"
printf "$TABLE_FORMAT_LINE" "Total-time:" "$(expr $t_stop - $t_start)s" "Exit Code:" "$code"
printf "$TABLE_FORMAT_LINE" "End-time:" "$(date +%T)" "End-date:" "$(date +%T)"
printf "$TABLE_FOOTER"
exit $code |
@sbryngelson I implemented the logic for this in a branch I am about to make a PR for. I just need everyone’s batch files for the systems we wish to have first-party support for. |
You can ask on the Slack. |
Currently, we use "template" batch submission scripts to generate and submit job scripts when running
./mfc.sh run <...> -e batch
. This adds a lot of convenience and modularity but some tuning is still required to get some jobs (especially GPU ones) running on different systems. Since many of us use the same systems, it would be nice if these modifications were tracked in source control for added visibility and to make running batch jobs easier for new users. There are two options I can think of:misc/
folder, the templates we use for different systems.What are you thoughts?
The text was updated successfully, but these errors were encountered: