Skip to content

Commit

Permalink
power saving SLURM script
Browse files Browse the repository at this point in the history
  • Loading branch information
mirjunaid26 committed Aug 30, 2023
1 parent 4d865eb commit cb86a48
Showing 1 changed file with 37 additions and 0 deletions.
37 changes: 37 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,43 @@ It may be useful to remember the sinfo codes:
\^ The node reboot was issued.
\- The node is planned by the backfill scheduler for a higher priority job.

#### slurm.conf script
```
#!/bin/bash
# PowerSaving Segment
# How long shall we wait for a node to be idle before shutting it down?
SuspendTime=300 # 5 minutes
# How many nodes will your utility let you stop at once?
SuspendRate=40
# How many nodes will your utility let you start at once?
ResumeRate=40
# Where is the script to shut the nodes down?
SuspendProgram=/opt/system/slurm/etc/suspend.sh
# Where is the script to start the nodes up?
ResumeProgram=/opt/system/slurm/etc/resume.sh
# Time how long does it take for your slowest node to shutdown? Add 60 to that and put the answer below.
SuspendTimeout=300
# Time how long it takes for your slowest node to start up? Add 60 to that and put the answer below.
ResumeTimeout=240
# If you want to exclude certain nodes or partitions, enable the below and fill them in.
#SuspendExcNodes=compute-3
#SuspendExcParts=main
# How long to wait between a node being up, and when a job runs. This can take awhile, FYI.
BatchStartTimeout=360 # Default is 10.
# Time for messages to go from the management node and a compute node AND back. This can take awhile.
MessageTimeout=100 # Default is 10.

# I kept the partition and node information below the above.
...

# At the very bottom I changed/added the below. The first is to prevent the nodes from being 'down' when they suspend.
# Which requires some scontrol commands to return them to service.
SlurmctldParameters=enable_configless,idle_on_node_suspend
# This one returns a node to service when it registers itself as up.
ReturnToService=2
```
Expand Down

0 comments on commit cb86a48

Please sign in to comment.