Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Gather and maintain a list of "batch file" templates for various computers #240

Closed
henryleberre opened this issue Nov 26, 2023 · 4 comments · Fixed by #307
Closed
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed question Further information is requested

Comments

@henryleberre
Copy link
Member

Currently, we use "template" batch submission scripts to generate and submit job scripts when running ./mfc.sh run <...> -e batch. This adds a lot of convenience and modularity but some tuning is still required to get some jobs (especially GPU ones) running on different systems. Since many of us use the same systems, it would be nice if these modifications were tracked in source control for added visibility and to make running batch jobs easier for new users. There are two options I can think of:

  • We maintain, perhaps in the misc/ folder, the templates we use for different systems.
  • We add everything to current template files in the form of comments one can just remove and edit.
  • We build toolchain support for various systems and run configurations. However, this does not sound neither easy, nor friendly, nor maintainable.

What are you thoughts?

@henryleberre henryleberre added enhancement New feature or request help wanted Extra attention is needed question Further information is requested labels Nov 26, 2023
@sbryngelson
Copy link
Member

I agree that something useful can and should be done here. 2. and 3. do sound hard to maintain and more complicated than necessary. 1. seems best.

With 1., it would be useful to keep templates there, but also build in a way to automatically grab those templates via mfc.sh calls. Most new folks don't even know the templates exist or how they work, or where they end up. So, something like mfc.sh run <...> -e batch -c bridges2 (c for computer, I suppose) that automatically grabs the appropriate bridges2 template would be quite useful (and maintainable).

@henryleberre henryleberre added the good first issue Good for newcomers label Dec 25, 2023
@sbryngelson sbryngelson linked a pull request Jan 3, 2024 that will close this issue
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 6, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 6, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 6, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 6, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 6, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 6, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 7, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 7, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 7, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 7, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 7, 2024
@sbryngelson
Copy link
Member

sbryngelson commented Jan 8, 2024

The documentation for Bridges2 with > 8 GPUs per job contradicts what works on their system. The relevant distinction is the need to use --gres and :8 even though one wants, in this case, 16 total GPUs over 2 nodes.

So, I'll post a working script here for a future template (> 1 GPU node, in this case, 2 nodes of 8 GPUs):

#!/usr/bin/env bash
#SBATCH --job-name="shb-test"
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=1
#SBATCH --gpu-bind=verbose,closest
#SBATCH --gres=gpu:v100-16:8
#SBATCH --time=00:10:00
#SBATCH --partition="GPU"
#SBATCH --output="shb-test.out"
#SBATCH --account="phy210041p"
#SBATCH --error="shb-test.err"
#SBATCH --export=ALL
#SBATCH --mail-type="BEGIN, END, FAIL"

TABLE_FORMAT_LINE="| - %-14s %-35s - %-14s %-35s |\n"
TABLE_HEADER="+-----------------------------------------------------------------------------------------------------------+ \n"
TABLE_FOOTER="+-----------------------------------------------------------------------------------------------------------+ \n"
TABLE_TITLE_FORMAT="| %8s %-96s |\n"
TABLE_CONTENT=$(cat <<-END
$(printf "$TABLE_FORMAT_LINE" "Start-time:"    "$(date +%T)"                       "Start-date:"    "$(date +%T)")
$(printf "$TABLE_FORMAT_LINE" "Partition:"     "GPU"      "Walltime:"      "00:10:00")
$(printf "$TABLE_FORMAT_LINE" "Account:"       "phy210041p"        "Nodes:"         "1")
$(printf "$TABLE_FORMAT_LINE" "Job Name:"      "shb-test"           "Engine"         "batch")
$(printf "$TABLE_FORMAT_LINE" "Queue System:"  "SLURM"                     "Email:"         "")
END
)

printf "$TABLE_HEADER"
printf "$TABLE_TITLE_FORMAT" "Starting" "shb-test from /ocean/projects/phy210041p/bryngel/MFC/examples/1D_sodshocktube/case.py:"
printf "$TABLE_CONTENT\n"
printf "$TABLE_FOOTER\n"

printf ":) Loading modules...\n"

module purge
module load openmpi/4.0.5-nvhpc22.9 nvhpc/22.9 cuda/11.7.1 python/3.8.6

cd "/ocean/projects/phy210041p/bryngel/MFC/examples/1D_sodshocktube"

t_start=$(date +%s)

for binpath in '/ocean/projects/phy210041p/bryngel/MFC/build/install/no-debug_gpu_mpi/bin/syscheck' '/ocean/projects/phy210041p/bryngel/MFC/build/install/no-debug_gpu_mpi/bin/pre_process' '/ocean/projects/phy210041p/bryngel/MFC/build/install/no-debug_gpu_mpi/bin/simulation'; do

echo -e ":) Running $binpath:"

mpirun -np 16 "$binpath"

done

code=$?

t_stop="$(date +%s)"

printf "\n$TABLE_HEADER"
printf "$TABLE_TITLE_FORMAT" "Finished" "shb-test:"
printf "$TABLE_FORMAT_LINE" "Total-time:"  "$(expr $t_stop - $t_start)s"  "Exit Code:" "$code"
printf "$TABLE_FORMAT_LINE" "End-time:"    "$(date +%T)"                  "End-date:"  "$(date +%T)"
printf "$TABLE_FOOTER"

exit $code

@henryleberre
Copy link
Member Author

@sbryngelson I implemented the logic for this in a branch I am about to make a PR for. I just need everyone’s batch files for the systems we wish to have first-party support for.

@sbryngelson
Copy link
Member

@sbryngelson I implemented the logic for this in a branch I am about to make a PR for. I just need everyone’s batch files for the systems we wish to have first-party support for.

You can ask on the Slack.

henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 9, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 9, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 9, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 9, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 9, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 9, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 9, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 9, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 9, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 9, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 10, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 11, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 13, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 13, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 13, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 14, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 14, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 14, 2024
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 14, 2024
@henryleberre henryleberre linked a pull request Jan 14, 2024 that will close this issue
henryleberre added a commit to henryleberre/ChemMFC that referenced this issue Jan 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed question Further information is requested
2 participants