Skip to content

Latest commit

 

History

History
90 lines (72 loc) · 6.21 KB

File metadata and controls

90 lines (72 loc) · 6.21 KB

Description

This module creates partition of TPU nodeset. TPUs are Google's custom-developed application specific ICs to accelerate machine learning workloads.

Example

The following code snippet creates TPU partition with following attributes.

  • TPU nodeset module is connected to network module.
  • TPU nodeset is of type v2-8 and version 2.10.0, you can check different configuration configuration
  • TPU vms are preemptible.
  • preserve_tpu is set to false. This means, suspended vms will be deleted.
  • Partition module uses this defined tpu_nodeset module and this partition can be accessed as tpu partition.
  - id: tpu_nodeset
    source: ./community/modules/compute/schedmd-slurm-gcp-v6-nodeset-tpu
    use: [network]
    settings:
      node_type: v2-8
      tf_version: 2.10.0
      disable_public_ips: false
      preemptible: true
      preserve_tpu: false

  - id: tpu_partition
    source: ./community/modules/compute/schedmd-slurm-gcp-v6-partition
    use: [tpu_nodeset]
    settings:
      partition_name: tpu

Requirements

Name Version
terraform >= 1.3
google >= 3.83

Providers

Name Version
google >= 3.83

Modules

No modules.

Resources

Name Type
google_compute_default_service_account.default data source

Inputs

Name Description Type Default Required
accelerator_config Nodeset accelerator config, see https://cloud.google.com/tpu/docs/supported-tpu-configurations for details.
object({
topology = string
version = string
})
{
"topology": "",
"version": ""
}
no
data_disks The data disks to include in the TPU node list(string) [] no
disable_public_ips DEPRECATED: Use enable_public_ips instead. bool null no
docker_image The gcp container registry id docker image to use in the TPU vms, it defaults to gcr.io/schedmd-slurm-public/tpu:slurm-gcp-6-6-tf-<var.tf_version> string null no
enable_public_ips If set to true. The node group VMs will have a random public IP assigned to it. Ignored if access_config is set. bool false no
name Name of the nodeset. Automatically populated by the module id if not set.
If setting manually, ensure a unique value across all nodesets.
string n/a yes
network_storage An array of network attached storage mounts to be configured on nodes.
list(object({
server_ip = string,
remote_mount = string,
local_mount = string,
fs_type = string,
mount_options = string,
}))
[] no
node_count_dynamic_max Maximum number of auto-scaling nodes allowed in this partition. number 5 no
node_count_static Number of nodes to be statically created. number 0 no
node_type Specify a node type to base the vm configuration upon it. string n/a yes
preemptible Should use preemptibles to burst. bool false no
preserve_tpu Specify whether TPU-vms will get preserve on suspend, if set to true, on suspend vm is stopped, on false it gets deleted bool false no
project_id Project ID to create resources in. string n/a yes
reserved Specify whether TPU-vms in this nodeset are created under a reservation. bool false no
service_account DEPRECATED: Use service_account_email and service_account_scopes instead.
object({
email = string
scopes = set(string)
})
null no
service_account_email Service account e-mail address to attach to the TPU-vm. string null no
service_account_scopes Scopes to attach to the TPU-vm. set(string)
[
"https://www.googleapis.com/auth/cloud-platform"
]
no
subnetwork_self_link The name of the subnetwork to attach the TPU-vm of this nodeset to. string n/a yes
tf_version Nodeset Tensorflow version, see https://cloud.google.com/tpu/docs/supported-tpu-configurations#tpu_vm for details. string "2.14.0" no
zone Zone in which to create compute VMs. TPU partitions can only specify a single zone. string n/a yes

Outputs

Name Description
nodeset_tpu Details of the nodeset tpu. Typically used as input to schedmd-slurm-gcp-v6-partition.