-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose device UUIDs to node label #1116
base: main
Are you sure you want to change the base?
Conversation
internal/lm/nvml.go
Outdated
} | ||
|
||
labels := Labels{ | ||
"nvidia.com/gpu.uuid": strings.Join(uuids, ","), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not a good idea to keep appending all uuids in a single label since "Valid label value: must be 63 characters or less"
https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/.
Also for a server with 8GPUs adding all uuid in single label might not end up useful .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Annotation can handle it, but there seems no easy way to add annotation from the gpu-feature-discovery. I ended up adding a label per gpu device.
Signed-off-by: Zubiao Xiong <[email protected]>
@xiongzubiao could you please provide information on how these labels will be used? |
@elezar, we want to provide some sort of visualization to user. User can click each GPU to check its properties, status, and metrics. The device UUID is the natural choice for indexing. There are other ways to get UUID, but it is most straightforward to get it from node labels, because it is a part of node properties. There is another use case mentioned in #1015: scheduling pod to a specific GPU using node label matching. |
Closes #1015