Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifying Specific GPU Models for Pods in Nodes with Multiple GPU Types #656

Open
anencore94 opened this issue Jan 18, 2024 · 5 comments
Open

Comments

@anencore94
Copy link

2. Issue or feature description

I am currently working with a Kubernetes cluster where some nodes are equipped with multiple types of NVIDIA GPUs. For example, Node A has one A100 GPU and one V100 GPU. In such a setup, I am looking for a way to specify which GPU model should be allocated when a user creates a GPU-allocated pod.

From my understanding, in such cases, we would typically request a GPU in our pod specifications using resources.limits with nvidia.com/gpu: 1. However, this approach doesn't seem to provide a way to distinguish between different GPU models.

Is there a feature or method within the NVIDIA GPU Operator or Kubernetes ecosystem that allows for such specific GPU model selection during pod creation? If not, are there any best practices or recommended approaches to ensure a pod is scheduled with a specific type of GPU when multiple models are present in the same node?

Thank you for your time and assistance.

@cdesiniotis
Copy link
Contributor

@anencore94 there is unfortunately no supported way of accomplishing this today with the device plugin API.

Dynamic Resource Allocation, a new API for requesting and allocating resources in Kubernetes, would allow us to naturally support such configurations, but it is currently an alpha feature.

@anencore94
Copy link
Author

@cdesiniotis Thanks for sharing :). You mean implement this feature using Dynamic Resource Allocation API needs quite a long time, I guess..

@laszlocph
Copy link

I was able to pick the GPU by specifying the

apiVersion: v1
kind: Pod
metadata:
  name: vllm-openai
  namespace: training
spec:
  runtimeClassName: nvidia
  containers:
  - name: vllm-openai
    image: "vllm/vllm-openai:latest"
    args: ["--model", "Qwen/Qwen1.5-14B-Chat"]
+    env:
+    - name: NVIDIA_VISIBLE_DEVICES
+      value: "0"
    resources:
      limits:
        nvidia.com/gpu: 1

variable. Where the number is the zero-indexed number of my GPUs.

These other vars may also work, but have not tested them: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html

@anencore94
Copy link
Author

@laszlocph Thanks for your case! However, I'd like to control it in k8s way. 🥲

@jjaymick001
Copy link

I do this via nodeSelector.

kubectl get nodes -L nvidia.com/gpu.count -L nvidia.com/gpu.product
NAME            STATUS   ROLES           AGE    VERSION   GPU.COUNT   GPU.PRODUCT
dell-mx740c-2   Ready    control-plane   3d8h   v1.26.3   1           NVIDIA-A100-PCIE-40GB
dell-mx740c-3   Ready    control-plane   3d8h   v1.26.3   2           Tesla-T4
dell-mx740c-7   Ready    <none>          3d8h   v1.26.3   2           Quadro-RTX-8000
dell-mx740c-8   Ready    <none>          3d8h   v1.26.3   2           NVIDIA-A100-PCIE-40GB

I can use gpu.product as the selector to ensure the pod lands on the intended GPU type
like this.

apiVersion: v1
kind: Pod
metadata:
  name: nvidia-ver-740c-8
spec:
  restartPolicy: OnFailure
  nodeSelector:
     nvidia.com/gpu.product: "NVIDIA-A100-PCIE-40GB"
     nvidia.com/gpu.count: "2"
  containers:
  - name: nvidia-version-check
    image: "nvidia/cuda:11.0.3-base-ubuntu20.04"
    command: ["nvidia-smi"]
    resources:
      limits:
        nvidia.com/gpu: "1"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants