Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Migrate to go-nvlib #73

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

cdesiniotis
Copy link
Contributor

This PR migrates to using go-nvlib for enumerating NVIDIA PCI and vGPU devices on the system, as well as parsing the pci database file. go-nvlib contains a set of common go packages used across many cloud-native components, including the k8s-device-plugin and vgpu-device-manager.

The nvpci package is used for enumerating all NVIDIA PCI devices and creating the iommuMap and deviceMap which represent all of the passthrough GPUs. The nvmdev package is used for enumerating all NVIDIA vGPU devices and creating the vGgpuMap and gpuVgpuMap which represent all the vGPU devices. The pciids package is used for parsing the pci database.

cc @rthallisey @zvonkok @elezar @shivamerla

…ing device names

Signed-off-by: Christopher Desiniotis <[email protected]>
Signed-off-by: Christopher Desiniotis <[email protected]>
Signed-off-by: Christopher Desiniotis <[email protected]>
Signed-off-by: Christopher Desiniotis <[email protected]>
}
for _, dev := range devices {
gpuAddress := dev.Parent.Address
vgpuType := strings.ReplaceAll(dev.MDEVType, " ", "_")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, before this change, the device plugin would use the full contents of the mdev_type/name file to construct the corresponding resource name. E.g. mdev_type/name would contain the string NVIDIA A10-12Q and the plugin would advertise nvidia.com/NVIDIA_A10-12Q resources. The nvmdev package strips the leading NVIDIA | GRID in this file, so the resource name would now be nvidia.com/A10-12Q.

I personally would prefer the latter naming strategy, but this is a breaking change from the user's perspective. I have opened https://gitlab.com/nvidia/cloud-native/go-nvlib/-/merge_requests/45 to add a device.MDEVName field so we can use the full contents of mdev_type/name when constructing the resource name (and thereby retain the current behavior of the plugin).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant