Skip to content

Latest commit

 

History

History
 
 

schedmd-slurm-gcp-v6-partition

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Description

This module creates a compute partition that can be used as input to the schedmd-slurm-gcp-v6-controller.

The partition module is designed to work alongside the schedmd-slurm-gcp-v6-nodeset module. A partition can be made up of one or more nodesets, provided either through use (preferred) or defined manually in the nodeset variable.

Example

The following code snippet creates a partition module with:

  • 2 nodesets added via use.
    • The first nodeset is made up of machines of type c2-standard-30.
    • The second nodeset is made up of machines of type c2-standard-60.
    • Both nodesets have a maximum count of 200 dynamically created nodes.
  • partition name of "compute".
  • connected to the network module via use.
  • nodes mounted to homefs via use.
- id: nodeset_1
  source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
  use:
  - network
  settings:
    name: c30
    node_count_dynamic_max: 200
    machine_type: c2-standard-30

- id: nodeset_2
  source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
  use:
  - network
  settings:
    name: c60
    node_count_dynamic_max: 200
    machine_type: c2-standard-60

- id: compute_partition
  source: community/modules/compute/schedmd-slurm-gcp-v6-partition
  use:
  - homefs
  - nodeset_1
  - nodeset_2
  settings:
    partition_name: compute

Support

The Cluster Toolkit team maintains the wrapper around the slurm-on-gcp terraform modules. For support with the underlying modules, see the instructions in the slurm-gcp README.

Requirements

Name Version
terraform >= 1.3

Providers

No providers.

Modules

No modules.

Resources

No resources.

Inputs

Name Description Type Default Required
exclusive Exclusive job access to nodes. bool true no
is_default Sets this partition as the default partition by updating the partition_conf.
If "Default" is already set in partition_conf, this variable will have no effect.
bool false no
network_storage DEPRECATED
list(object({
server_ip = string,
remote_mount = string,
local_mount = string,
fs_type = string,
mount_options = string,
client_install_runner = map(string)
mount_runner = map(string)
}))
[] no
nodeset A list of nodesets.
For type definition see community/modules/scheduler/schedmd-slurm-gcp-v6-controller/variables.tf::nodeset
list(any) [] no
nodeset_dyn Defines dynamic nodesets, as a list.
list(object({
nodeset_name = string
nodeset_feature = string
}))
[] no
nodeset_tpu Define TPU nodesets, as a list.
list(object({
node_count_static = optional(number, 0)
node_count_dynamic_max = optional(number, 5)
nodeset_name = string
enable_public_ip = optional(bool, false)
node_type = string
accelerator_config = optional(object({
topology = string
version = string
}), {
topology = ""
version = ""
})
tf_version = string
preemptible = optional(bool, false)
preserve_tpu = optional(bool, false)
zone = string
data_disks = optional(list(string), [])
docker_image = optional(string, "")
network_storage = optional(list(object({
server_ip = string
remote_mount = string
local_mount = string
fs_type = string
mount_options = string
})), [])
subnetwork = string
service_account = optional(object({
email = optional(string)
scopes = optional(list(string), ["https://www.googleapis.com/auth/cloud-platform"])
}))
project_id = string
reserved = optional(string, false)
}))
[] no
partition_conf Slurm partition configuration as a map.
See https://slurm.schedmd.com/slurm.conf.html#SECTION_PARTITION-CONFIGURATION
map(string) {} no
partition_name The name of the slurm partition. string n/a yes
resume_timeout Maximum time permitted (in seconds) between when a node resume request is issued and when the node is actually available for use.
If null is given, then a smart default will be chosen depending on nodesets in partition.
This sets 'ResumeTimeout' in partition_conf.
See https://slurm.schedmd.com/slurm.conf.html#OPT_ResumeTimeout_1 for details.
number 300 no
suspend_time Nodes which remain idle or down for this number of seconds will be placed into power save mode by SuspendProgram.
This sets 'SuspendTime' in partition_conf.
See https://slurm.schedmd.com/slurm.conf.html#OPT_SuspendTime_1 for details.
NOTE: use value -1 to exclude partition from suspend.
number 300 no
suspend_timeout Maximum time permitted (in seconds) between when a node suspend request is issued and when the node is shutdown.
If null is given, then a smart default will be chosen depending on nodesets in partition.
This sets 'SuspendTimeout' in partition_conf.
See https://slurm.schedmd.com/slurm.conf.html#OPT_SuspendTimeout_1 for details.
number null no

Outputs

Name Description
nodeset Details of a nodesets in this partition
nodeset_dyn Details of a dynamic nodesets in this partition
nodeset_tpu Details of a TPU nodesets in this partition
partitions Details of a slurm partition