-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specifying Specific GPU Models for Pods in Nodes with Multiple GPU Types #656
Comments
@anencore94 there is unfortunately no supported way of accomplishing this today with the device plugin API. Dynamic Resource Allocation, a new API for requesting and allocating resources in Kubernetes, would allow us to naturally support such configurations, but it is currently an alpha feature. |
@cdesiniotis Thanks for sharing :). You mean implement this feature using Dynamic Resource Allocation API needs quite a long time, I guess.. |
I was able to pick the GPU by specifying the apiVersion: v1
kind: Pod
metadata:
name: vllm-openai
namespace: training
spec:
runtimeClassName: nvidia
containers:
- name: vllm-openai
image: "vllm/vllm-openai:latest"
args: ["--model", "Qwen/Qwen1.5-14B-Chat"]
+ env:
+ - name: NVIDIA_VISIBLE_DEVICES
+ value: "0"
resources:
limits:
nvidia.com/gpu: 1 variable. Where the number is the zero-indexed number of my GPUs. These other vars may also work, but have not tested them: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html |
@laszlocph Thanks for your case! However, I'd like to control it in k8s way. 🥲 |
I do this via nodeSelector.
I can use gpu.product as the selector to ensure the pod lands on the intended GPU type
|
2. Issue or feature description
I am currently working with a Kubernetes cluster where some nodes are equipped with multiple types of NVIDIA GPUs. For example, Node A has one A100 GPU and one V100 GPU. In such a setup, I am looking for a way to specify which GPU model should be allocated when a user creates a GPU-allocated pod.
From my understanding, in such cases, we would typically request a GPU in our pod specifications using resources.limits with
nvidia.com/gpu: 1
. However, this approach doesn't seem to provide a way to distinguish between different GPU models.Is there a feature or method within the NVIDIA GPU Operator or Kubernetes ecosystem that allows for such specific GPU model selection during pod creation? If not, are there any best practices or recommended approaches to ensure a pod is scheduled with a specific type of GPU when multiple models are present in the same node?
Thank you for your time and assistance.
The text was updated successfully, but these errors were encountered: