Skip to content
This repository has been archived by the owner on Jul 15, 2021. It is now read-only.

Latest commit

 

History

History
52 lines (35 loc) · 2.09 KB

enable-gpu-workload.md

File metadata and controls

52 lines (35 loc) · 2.09 KB

Enable GPU for Kubeflow Pipelines on Azure Kubernetes Service (AKS)

Graphical processing units (GPUs) are often used for compute-intensive workloads such as graphics and visualization workloads. AKS supports the creation of GPU-enabled node pools to run these compute-intensive workloads in Kubernetes

Prerequisites

  1. GPU-enabled node pool on AKS
    To enable GPU on your Kubeflow cluster, follow the instructions on how to use GPUs for compute-intensive workloads on Azure Kubernetes  Service (AKS)

For AKS cluster with existing CPU node pool, you can add another node pool with GPUs

az aks nodepool add \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --node-vm-size Standard_NC6 \
    --name gpu \
    --node-count 3
  1. Install NVIDIA drivers

    Install NVIDIA drivers in Kubernetes namespace to deploy DaemonSet for the NVIDIA device plugin. This DaemonSet runs a pod on each node to provide the required drivers for the GPUs.

    kubectl create namespace gpu-resources
    kubectl apply -f nvidia-device-plugin-ds.yaml

Configure ContainerOp to consume GPUs

Set the node selector constraint on ContainerOp which sets the pod specification of the component to run pod on specific device matching key-value label.

import kfp.dsl as dsl
gpu_op = dsl.ContainerOp(name='gpu-op', ...)
gpu_op.add_node_selector_constraint('accelerator', 'nvidia')

Set GPU limit

GPU limit can be set with set_gpu_limit() on ContainerOp.

import kfp.dsl as dsl
gpu_op = dsl.ContainerOp(name='gpu-op', ...).set_gpu_limit(2)

Sample component that runs on GPU

This repository has a sample component (gpu-op) that runs Pytorch container that checks and print GPU(CUDA) device specification.