Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Multi-node GPU summissions for greatlakes and picotte. #702

Closed
wants to merge 4 commits into from

Conversation

b-butler
Copy link
Member

Description

Fixes logic where the --ntasks-per-node would not normalize based on
number of nodes for GPU submissions where the number of tasks is often
the number of GPUs.

Motivation and Context

Resolved: #566

Checklist:

Fixes logic where the --ntasks-per-node would not normalize based on
number of nodes for GPU submissions where the number of tasks is often
the number of GPUs.
@b-butler b-butler requested review from a team as code owners December 12, 2022 21:05
@b-butler b-butler requested review from csadorf and Charlottez112 and removed request for a team December 12, 2022 21:05
@codecov
Copy link

codecov bot commented Dec 12, 2022

Codecov Report

Merging #702 (71966d0) into master (d543982) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #702   +/-   ##
=======================================
  Coverage   68.57%   68.57%           
=======================================
  Files          41       41           
  Lines        4162     4162           
  Branches     1025     1025           
=======================================
  Hits         2854     2854           
  Misses       1097     1097           
  Partials      211      211           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one quick question.

@@ -12,7 +12,7 @@
{% set nn = nn|default((nn_cpu, nn_gpu)|max, true) %}
{% if partition == 'gpu' %}
#SBATCH --nodes={{ nn|default(1, true) }}
#SBATCH --ntasks-per-node={{ (gpu_tasks, cpu_tasks)|max }}
#SBATCH --ntasks-per-node={{ ((gpu_tasks, cpu_tasks)|max / nn)|int }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only be correct if the number of nodes is evenly divisible by the number of tasks, right? Would this be an issue? If yes, should we protect against that?

Copy link
Member

@joaander joaander Dec 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tasks cannot be evenly distributed across an integer number of nodes (e.g. nranks=13), then there is no way to request this with --ntasks-per-node - if you rounded up you would be providing the user with more ranks than they requested. Use --ntasks={nranks} instead in these cases.

Preserve the --nodes=, --ntasks-per-node=, request when tasks can be evenly distributed across nodes. It will provide a more efficient communication pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so I think we should raise an error in case that the requested configuration cannot be provisioned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently round in other templates for GPU partitions e.g. expanse. We could change that. Generally that won't matter since we take the ceiling and charges for GPU nodes are usually just for GPUs I if I understand correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they would likely be charged correctly. However, if the users launches their app with srun or mpiexec without arguments, then the autodetected number of tasks detected from the job configuration. If this is rounded up from what the user requested, then the user's script may fail (e.g. when it is coded to work with a specific number of ranks).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, in the case of GPU jobs does it make sense to have CPU tasks not be a multiple of GPUs? I feel that is something we should error at then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible for systems to support a different number of CPU tasks and GPUs. For example, NCSA Delta does:

$ srun --account=bbgw-delta-gpu --partition=gpuA40x4 --tasks=7 --mem=48g --gpus=5 --pty zsh
$ echo $SLURM_NTASKS                                                                                                       
7
$ echo $SLURM_TASKS_PER_NODE
4,3

In this test, it assigned all 4 GPUs on the first node and 1 GPU on the 2nd.

Just because it is possible doesn't need that signac-flow needs to support it. I can't think of any reasonable workflows that would need this. Also, you would need to check each system separately whether it has configured SLURM to allow for this uneven task distribution.

@joaander
Copy link
Member

joaander commented Dec 23, 2022

I think there are more problems with this template. I ask for 4 ranks and 4 GPUs, but get 8 ranks spread across 4 GPUs:

import flow

class Project(flow.FlowProject):
    pass

@Project.operation(directives=dict(nranks=4, ngpu=4))
def job(job):
    pass

if __name__ == "__main__":
    Project.get_project().main()
python project.py submit --pretend --partition gpu
Using environment configuration: GreatLakesEnvironment
Querying scheduler...
Submitting cluster job 'Project/99914b932bd37a50b983c5e7c90ae93b/job/86cd9023f6b91d9e9107a4de7a492fbe':
 - Group: job(99914b932bd37a50b983c5e7c90ae93b)
# Submit command: sbatch
#!/bin/bash
#SBATCH --job-name="Project/99914b932bd37a50b983c5e7c90ae93b/job/86cd9023f6b91d9e9107a4de7a492fbe"
#SBATCH --partition=gpu
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --gpus=4

I get the correct output (nodes=1, ntasks-per-node=N, gpus=N) for N=1, 2, I get incorrect output (nodes=2) for N=3,4.

I also tested N=8 and got:

#SBATCH --nodes=4
#SBATCH --ntasks-per-node=8
#SBATCH --gpus=8

!

Something is seriously wrong with the nodes calculation, not just the ntaks-per-node.
Edit: Nevermind, I was assuming that GL had 4 GPUs per node, when it has 2. The problem is in the number of tasks per node.

@b-butler
Copy link
Member Author

Thanks @joaander I will get working on this after the holidays.

@b-butler
Copy link
Member Author

Given the shear complexities involved and problems in most templates given specific (corner case) resource requests. I am planning on moving much of this logic to Python in the environment classes. Much of this can be default, and just go into the base class so it doesn't offer much of a jump of complexity to users creating new environments.

@joaander
Copy link
Member

Given the shear complexities involved and problems in most templates given specific (corner case) resource requests. I am planning on moving much of this logic to Python in the environment classes. Much of this can be default, and just go into the base class so it doesn't offer much of a jump of complexity to users creating new environments.

You used this approach on #708, right? That is very clear and appears easy to maintain.

@b-butler b-butler added this to the v1.0 milestone Feb 22, 2023
@b-butler b-butler mentioned this pull request Feb 27, 2023
6 tasks
@bdice
Copy link
Member

bdice commented Jul 12, 2023

@b-butler Can we push this through or close it?

@b-butler
Copy link
Member Author

I believe #722 fixes this.

@b-butler b-butler closed this Oct 23, 2023
@joaander joaander deleted the fix/multinode-gpu-submission branch February 2, 2024 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Template Multinode GPU Error
4 participants