fix: Multi-node GPU summissions for greatlakes and picotte. #702

b-butler · 2022-12-12T21:05:49Z

Description

Fixes logic where the --ntasks-per-node would not normalize based on
number of nodes for GPU submissions where the number of tasks is often
the number of GPUs.

Motivation and Context

Resolved: #566

Checklist:

I am familiar with the Contributing Guidelines.
I agree with the terms of the Contributor Agreement.
My name is on the list of contributors.
The changes introduced by this pull request are covered by existing or newly introduced tests.
The package documentation and framework documentation in signac-docs are up to date with these changes.
I have updated the changelog and added any related issue and pull request numbers for future reference.

Fixes logic where the --ntasks-per-node would not normalize based on number of nodes for GPU submissions where the number of tasks is often the number of GPUs.

codecov · 2022-12-12T21:11:45Z

Codecov Report

Merging #702 (71966d0) into master (d543982) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #702   +/-   ##
=======================================
  Coverage   68.57%   68.57%           
=======================================
  Files          41       41           
  Lines        4162     4162           
  Branches     1025     1025           
=======================================
  Hits         2854     2854           
  Misses       1097     1097           
  Partials      211      211

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

csadorf

Looks good, just one quick question.

csadorf · 2022-12-14T09:33:22Z

flow/templates/drexel-picotte.sh

@@ -12,7 +12,7 @@
    {% set nn = nn|default((nn_cpu, nn_gpu)|max, true) %}
    {% if partition == 'gpu' %}
 #SBATCH --nodes={{ nn|default(1, true) }}
-#SBATCH --ntasks-per-node={{ (gpu_tasks, cpu_tasks)|max }}
+#SBATCH --ntasks-per-node={{ ((gpu_tasks, cpu_tasks)|max / nn)|int }}


This will only be correct if the number of nodes is evenly divisible by the number of tasks, right? Would this be an issue? If yes, should we protect against that?

If the tasks cannot be evenly distributed across an integer number of nodes (e.g. nranks=13), then there is no way to request this with --ntasks-per-node - if you rounded up you would be providing the user with more ranks than they requested. Use --ntasks={nranks} instead in these cases.

Preserve the --nodes=, --ntasks-per-node=, request when tasks can be evenly distributed across nodes. It will provide a more efficient communication pattern.

Yes, so I think we should raise an error in case that the requested configuration cannot be provisioned.

We currently round in other templates for GPU partitions e.g. expanse. We could change that. Generally that won't matter since we take the ceiling and charges for GPU nodes are usually just for GPUs I if I understand correctly.

Yes, they would likely be charged correctly. However, if the users launches their app with srun or mpiexec without arguments, then the autodetected number of tasks detected from the job configuration. If this is rounded up from what the user requested, then the user's script may fail (e.g. when it is coded to work with a specific number of ranks).

Okay, in the case of GPU jobs does it make sense to have CPU tasks not be a multiple of GPUs? I feel that is something we should error at then.

It is possible for systems to support a different number of CPU tasks and GPUs. For example, NCSA Delta does:

$ srun --account=bbgw-delta-gpu --partition=gpuA40x4 --tasks=7 --mem=48g --gpus=5 --pty zsh $ echo $SLURM_NTASKS 7 $ echo $SLURM_TASKS_PER_NODE 4,3

In this test, it assigned all 4 GPUs on the first node and 1 GPU on the 2nd.

Just because it is possible doesn't need that signac-flow needs to support it. I can't think of any reasonable workflows that would need this. Also, you would need to check each system separately whether it has configured SLURM to allow for this uneven task distribution.

joaander · 2022-12-23T13:53:40Z

I think there are more problems with this template. I ask for 4 ranks and 4 GPUs, but get 8 ranks spread across 4 GPUs:

import flow

class Project(flow.FlowProject):
    pass

@Project.operation(directives=dict(nranks=4, ngpu=4))
def job(job):
    pass

if __name__ == "__main__":
    Project.get_project().main()

python project.py submit --pretend --partition gpu
Using environment configuration: GreatLakesEnvironment
Querying scheduler...
Submitting cluster job 'Project/99914b932bd37a50b983c5e7c90ae93b/job/86cd9023f6b91d9e9107a4de7a492fbe':
 - Group: job(99914b932bd37a50b983c5e7c90ae93b)
# Submit command: sbatch
#!/bin/bash
#SBATCH --job-name="Project/99914b932bd37a50b983c5e7c90ae93b/job/86cd9023f6b91d9e9107a4de7a492fbe"
#SBATCH --partition=gpu
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --gpus=4

I get the correct output (nodes=1, ntasks-per-node=N, gpus=N) for N=1, 2, I get incorrect output (nodes=2) for N=3,4.

I also tested N=8 and got:

#SBATCH --nodes=4
#SBATCH --ntasks-per-node=8
#SBATCH --gpus=8

!

~~Something is seriously wrong with the nodes calculation, not just the ntaks-per-node.~~
Edit: Nevermind, I was assuming that GL had 4 GPUs per node, when it has 2. The problem is in the number of tasks per node.

b-butler · 2022-12-23T19:04:16Z

Thanks @joaander I will get working on this after the holidays.

b-butler · 2023-01-16T19:27:05Z

Given the shear complexities involved and problems in most templates given specific (corner case) resource requests. I am planning on moving much of this logic to Python in the environment classes. Much of this can be default, and just go into the base class so it doesn't offer much of a jump of complexity to users creating new environments.

joaander · 2023-01-31T17:04:09Z

Given the shear complexities involved and problems in most templates given specific (corner case) resource requests. I am planning on moving much of this logic to Python in the environment classes. Much of this can be default, and just go into the base class so it doesn't offer much of a jump of complexity to users creating new environments.

You used this approach on #708, right? That is very clear and appears easy to maintain.

bdice · 2023-07-12T16:26:53Z

@b-butler Can we push this through or close it?

b-butler · 2023-10-23T19:16:25Z

I believe #722 fixes this.

fix: Multi-node GPU summissions for greatlakes and picotte.

3f1b658

Fixes logic where the --ntasks-per-node would not normalize based on number of nodes for GPU submissions where the number of tasks is often the number of GPUs.

b-butler requested review from a team as code owners December 12, 2022 21:05

b-butler requested review from csadorf and Charlottez112 and removed request for a team December 12, 2022 21:05

csadorf reviewed Dec 14, 2022

View reviewed changes

b-butler added 2 commits January 11, 2023 11:58

Merge branch 'master' into fix/multinode-gpu-submission

e5799fd

fix: Template handling when ntasks % n_nodes != 0.

4c4fa7b

Merge branch 'master' into fix/multinode-gpu-submission

71966d0

b-butler added this to the v1.0 milestone Feb 22, 2023

b-butler mentioned this pull request Feb 27, 2023

Refactor submission templates #722

Merged

6 tasks

b-butler closed this Oct 23, 2023

joaander deleted the fix/multinode-gpu-submission branch February 2, 2024 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Multi-node GPU summissions for greatlakes and picotte. #702

fix: Multi-node GPU summissions for greatlakes and picotte. #702

b-butler commented Dec 12, 2022

codecov bot commented Dec 12, 2022 •

edited

Loading

csadorf left a comment

csadorf Dec 14, 2022

joaander Dec 23, 2022 •

edited

Loading

csadorf Jan 10, 2023

b-butler Jan 11, 2023

joaander Jan 11, 2023

b-butler Jan 12, 2023

joaander Jan 12, 2023

joaander commented Dec 23, 2022 •

edited

Loading

b-butler commented Dec 23, 2022

b-butler commented Jan 16, 2023

joaander commented Jan 31, 2023

bdice commented Jul 12, 2023

b-butler commented Oct 23, 2023

fix: Multi-node GPU summissions for greatlakes and picotte. #702

fix: Multi-node GPU summissions for greatlakes and picotte. #702

Conversation

b-butler commented Dec 12, 2022

Description

Motivation and Context

Checklist:

codecov bot commented Dec 12, 2022 • edited Loading

Codecov Report

csadorf left a comment

Choose a reason for hiding this comment

csadorf Dec 14, 2022

Choose a reason for hiding this comment

joaander Dec 23, 2022 • edited Loading

Choose a reason for hiding this comment

csadorf Jan 10, 2023

Choose a reason for hiding this comment

b-butler Jan 11, 2023

Choose a reason for hiding this comment

joaander Jan 11, 2023

Choose a reason for hiding this comment

b-butler Jan 12, 2023

Choose a reason for hiding this comment

joaander Jan 12, 2023

Choose a reason for hiding this comment

joaander commented Dec 23, 2022 • edited Loading

b-butler commented Dec 23, 2022

b-butler commented Jan 16, 2023

joaander commented Jan 31, 2023

bdice commented Jul 12, 2023

b-butler commented Oct 23, 2023

codecov bot commented Dec 12, 2022 •

edited

Loading

joaander Dec 23, 2022 •

edited

Loading

joaander commented Dec 23, 2022 •

edited

Loading