Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose device UUIDs to node label #1116

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xiongzubiao
Copy link

Closes #1015

Copy link

copy-pr-bot bot commented Jan 9, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

}

labels := Labels{
"nvidia.com/gpu.uuid": strings.Join(uuids, ","),
Copy link

@shan100github shan100github Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not a good idea to keep appending all uuids in a single label since "Valid label value: must be 63 characters or less"
https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/.

Also for a server with 8GPUs adding all uuid in single label might not end up useful .

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Annotation can handle it, but there seems no easy way to add annotation from the gpu-feature-discovery. I ended up adding a label per gpu device.

@elezar
Copy link
Member

elezar commented Jan 10, 2025

@xiongzubiao could you please provide information on how these labels will be used?

@xiongzubiao
Copy link
Author

xiongzubiao commented Jan 10, 2025

@xiongzubiao could you please provide information on how these labels will be used?

@elezar, we want to provide some sort of visualization to user. User can click each GPU to check its properties, status, and metrics. The device UUID is the natural choice for indexing. There are other ways to get UUID, but it is most straightforward to get it from node labels, because it is a part of node properties.

There is another use case mentioned in #1015: scheduling pod to a specific GPU using node label matching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add gpu uuids to node labels
3 participants