Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use salloc if you want to srun while on a node #1

Open
suranap opened this issue Sep 6, 2024 · 2 comments
Open

Use salloc if you want to srun while on a node #1

suranap opened this issue Sep 6, 2024 · 2 comments

Comments

@suranap
Copy link

suranap commented Sep 6, 2024

I used srun to hop into a bash shell on a GPU machine. Then I wanted to use srun.to launch 4 processes on this same machine. It just hangs. Looks like srun reserves the whole node, and then further calls to srun are stuck. So this is a use case for salloc, and that's how I do stuff on Frontier/Perlmutter. However, on those systems salloc will jump into the machine also. That's convenient.

sapling-guide/README.md

Lines 63 to 78 in 4a2dc09

2. Allocate compute nodes through SLURM. Do NOT directly SSH to a
compute node:
* Do this: `srun -N 1 -n 1 -c 40 -p gpu --pty bash --login`
* Don't do this: `ssh g0001`
If for some reason you need SSH, then allocate the node through
`salloc` before you SSH to it:
```
salloc -n 1 -N 1 -c 40 -p gpu --exclusive
ssh $SLURM_NODELIST
```
Be sure to close out your session when you are done with it so
that the nodes are returned to the queue.

@elliottslaughter
Copy link
Contributor

You're right that you'd need to salloc and then srun inside of that if you want to do multiple jobs inside of an allocation.

Is there a specific request you're making or improvement you suggest? srun is the shortest one-line command, so it's what I generally recommend, and doing multiple jobs is generally a special case.

@suranap
Copy link
Author

suranap commented Sep 11, 2024

Sapling could match the behavior of salloc on HPCs by adding this to /etc/slurm.conf:

LaunchParameters=use_interactive_step

See here for more info. This is now the recommended way to do things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants