Skip to content

Commit

Permalink
Add additional SLURM options
Browse files Browse the repository at this point in the history
* Add the ability to pass environment variables to jobs
* Let user request multiple nodes, which will execute job in srun
  • Loading branch information
PerilousApricot authored and BenGalewsky committed Jun 6, 2024
1 parent 7a12df1 commit 16a0749
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,12 @@ properties in this file are:
|-------------------|----------------------------------------------------------------------------------------------------------------|
|partition | Which Slurm partition should the job run in? |
|account | What account name to run under |
| environment | List of additional environment variables to add to the job
| gpus_per_node | On GPU partitions how many GPUs to allocate per node |
| gres | SLURM Generic RESources requests |
| mem | Amount of memory to allocate to CPU jobs |
| modules | List of modules to load before starting job |
| nodes | Number of nodes to request from SLURM |
| time | Max CPU time job may run |
| sbatch-script-file | Name of batch file to be produced. Leave blank to have service generate a script file name based on the run ID |

Expand Down
12 changes: 12 additions & 0 deletions mlflow_slurm/templates/sbatch_template.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,25 @@
{% if config.time %}
#SBATCH --time={{ config.time }}
{% endif %}
{% if config.nodes %}
#SBATCH --nodes={{ config.nodes }}
{% endif %}
module reset # drop modules and explicitly load the ones needed
# (good job metadata and reproducibility)
# $WORK and $SCRATCH are now set
{% for module in config.modules %}
module load {{ module }}
{% endfor %}
module list # job documentation and metadata

{% for env in config.environment %}
export {{ env }}
{% endfor %}

export MLFLOW_RUN_ID={{ run_id }}
echo "job is starting on `hostname`"
{% if config.nodes %}
srun --export=ALL /bin/bash -c '{{ command }}'
{% else %}
{{ command }}
{% endif %}

0 comments on commit 16a0749

Please sign in to comment.