This module creates an instance template to be used by dynamic nodes, also it creates a nodeset data structure intended to be input to the schedmd-slurm-gcp-v6-partition module.
The following code snippet creates an instance template to be used by MIG.
- id: dynamic_ns
source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset-dynamic
use: [network, controller]
settings:
machine_type: n2-standard-2
- id: dynamic_partition
source: community/modules/compute/schedmd-slurm-gcp-v6-partition
use: [dynamic_ns]
settings:
partition_name: mp
is_default: true
- id: controller
source: community/modules/scheduler/schedmd-slurm-gcp-v6-controller
use: [network, dynamic_partition]
- id: mig
source: community/modules/compute/mig
settings:
versions:
- name: highlander # there can be only one
instance_template: $(dynamic_ns.instance_template_self_link)
base_instance_name: $(dynamic_ns.node_name_prefix)
For more information on creating valid custom images for the node group VM instances or for custom instance templates, see our vm-images.md documentation page.
More information on GPU support in Slurm on GCP and other Cluster Toolkit modules can be found at docs/gpu-support.md
The Cluster Toolkit team maintains the wrapper around the slurm-on-gcp terraform modules. For support with the underlying modules, see the instructions in the slurm-gcp README.
Name | Version |
---|---|
terraform | >= 1.3 |
>= 5.11 |
Name | Version |
---|---|
>= 5.11 |
Name | Source | Version |
---|---|---|
slurm_nodeset_template | github.com/GoogleCloudPlatform/slurm-gcp.git//terraform/slurm_cluster/modules/slurm_instance_template | 6.6.1 |
Name | Type |
---|---|
google_compute_default_service_account.default | data source |
google_compute_image.slurm | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
access_config | Access configurations, i.e. IPs via which the VM instance can be accessed via the Internet. | list(object({ |
[] |
no |
additional_disks | Configurations of additional disks to be included on the partition nodes. (do not use "disk_type: local-ssd"; known issue being addressed) | list(object({ |
[] |
no |
additional_networks | Additional network interface details for GCE, if any. | list(object({ |
[] |
no |
allow_automatic_updates | If false, disables automatic system package updates on the created instances. This feature is only available on supported images (or images derived from them). For more details, see https://cloud.google.com/compute/docs/instances/create-hpc-vm#disable_automatic_updates |
bool |
true |
no |
bandwidth_tier | Configures the network interface card and the maximum egress bandwidth for VMs. - Setting platform_default respects the Google Cloud Platform API default values for networking.- Setting virtio_enabled explicitly selects the VirtioNet network adapter.- Setting gvnic_enabled selects the gVNIC network adapter (without Tier 1 high bandwidth).- Setting tier_1_enabled selects both the gVNIC adapter and Tier 1 high bandwidth networking.- Note: both gVNIC and Tier 1 networking require a VM image with gVNIC support as well as specific VM families and shapes. - See official docs for more details. |
string |
"platform_default" |
no |
can_ip_forward | Enable IP forwarding, for NAT instances for example. | bool |
false |
no |
disk_auto_delete | Whether or not the boot disk should be auto-deleted. | bool |
true |
no |
disk_labels | Labels specific to the boot disk. These will be merged with var.labels. | map(string) |
{} |
no |
disk_size_gb | Size of boot disk to create for the partition compute nodes. | number |
50 |
no |
disk_type | Boot disk type, can be either hyperdisk-balanced, pd-ssd, pd-standard, pd-balanced, or pd-extreme. | string |
"pd-standard" |
no |
enable_confidential_vm | Enable the Confidential VM configuration. Note: the instance image must support option. | bool |
false |
no |
enable_oslogin | Enables Google Cloud os-login for user login and authentication for VMs. See https://cloud.google.com/compute/docs/oslogin |
bool |
true |
no |
enable_public_ips | If set to true. The node group VMs will have a random public IP assigned to it. Ignored if access_config is set. | bool |
false |
no |
enable_shielded_vm | Enable the Shielded VM configuration. Note: the instance image must support option. | bool |
false |
no |
enable_smt | Enables Simultaneous Multi-Threading (SMT) on instance. | bool |
false |
no |
enable_spot_vm | Enable the partition to use spot VMs (https://cloud.google.com/spot-vms). | bool |
false |
no |
feature | The node feature, used to bind nodes to the nodeset. If not set, the nodeset name will be used. | string |
null |
no |
guest_accelerator | List of the type and count of accelerator cards attached to the instance. | list(object({ |
[] |
no |
instance_image | Defines the image that will be used in the Slurm node group VM instances. Expected Fields: name: The name of the image. Mutually exclusive with family. family: The image family to use. Mutually exclusive with name. project: The project where the image is hosted. For more information on creating custom images that comply with Slurm on GCP see the "Slurm on GCP Custom Images" section in docs/vm-images.md. |
map(string) |
{ |
no |
instance_image_custom | A flag that designates that the user is aware that they are requesting to use a custom and potentially incompatible image for this Slurm on GCP module. If the field is set to false, only the compatible families and project names will be accepted. The deployment will fail with any other image family or name. If set to true, no checks will be done. See: https://goo.gle/hpc-slurm-images |
bool |
false |
no |
labels | Labels to add to partition compute instances. Key-value pairs. | map(string) |
{} |
no |
machine_type | Compute Platform machine type to use for this partition compute nodes. | string |
"c2-standard-60" |
no |
metadata | Metadata, provided as a map. | map(string) |
{} |
no |
min_cpu_platform | The name of the minimum CPU platform that you want the instance to use. | string |
null |
no |
name | Name of the nodeset. Automatically populated by the module id if not set. If setting manually, ensure a unique value across all nodesets. |
string |
n/a | yes |
on_host_maintenance | Instance availability Policy. Note: Placement groups are not supported when on_host_maintenance is set to "MIGRATE" and will be deactivated regardless of the value of enable_placement. To support enable_placement, ensure on_host_maintenance is set to "TERMINATE". |
string |
"TERMINATE" |
no |
preemptible | Should use preemptibles to burst. | bool |
false |
no |
project_id | Project ID to create resources in. | string |
n/a | yes |
region | The default region for Cloud resources. | string |
n/a | yes |
service_account_email | Service account e-mail address to attach to the compute instances. | string |
null |
no |
service_account_scopes | Scopes to attach to the compute instances. | set(string) |
[ |
no |
shielded_instance_config | Shielded VM configuration for the instance. Note: not used unless enable_shielded_vm is 'true'. - enable_integrity_monitoring : Compare the most recent boot measurements to the integrity policy baseline and return a pair of pass/fail results depending on whether they match or not. - enable_secure_boot : Verify the digital signature of all boot components, and halt the boot process if signature verification fails. - enable_vtpm : Use a virtualized trusted platform module, which is a specialized computer chip you can use to encrypt objects like keys and certificates. |
object({ |
{ |
no |
slurm_bucket_path | Path to the Slurm bucket. | string |
n/a | yes |
slurm_cluster_name | Name of the Slurm cluster. | string |
n/a | yes |
spot_instance_config | Configuration for spot VMs. | object({ |
null |
no |
subnetwork_self_link | Subnet to deploy to. | string |
n/a | yes |
tags | Network tag list. | list(string) |
[] |
no |
Name | Description |
---|---|
instance_template_self_link | The URI of the template. |
node_name_prefix | The prefix to be used for the node names. Make sure that nodes are named <node_name_prefix>-<any_suffix> This temporary required for proper functioning of the nodes. While Slurm scheduler uses "features" to bind node and nodeset, the SlurmGCP relies on node names for this (to be switched to features as well). |
nodeset_dyn | Details of the nodeset. Typically used as input to schedmd-slurm-gcp-v6-partition . |