Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(nvidia): use NVML + lspci to detect NVIDIA GPUs (without running nvidia-smi) #127

Merged
merged 6 commits into from
Oct 25, 2024

Conversation

gyuho
Copy link
Contributor

@gyuho gyuho commented Oct 18, 2024

No description provided.

@gyuho gyuho self-assigned this Oct 18, 2024
@gyuho gyuho changed the title fix(nvidia): use NVML + lspci to detect NVIDIA GPUs fix(nvidia): use NVML + lspci to detect NVIDIA GPUs (without running nvidia-smi) Oct 18, 2024
@gyuho gyuho added this to the v0.0.6 milestone Oct 18, 2024
Signed-off-by: Gyuho Lee <[email protected]>
Signed-off-by: Gyuho Lee <[email protected]>
Signed-off-by: Gyuho Lee <[email protected]>
if !smiInstalled {
return false, nil
}
log.Logger.Info("nvidia-smi installed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be debug level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if err != nil {
return false, err
}
log.Logger.Infow("detected nvidia gpu", "product", productName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

productName can also be a network card. do we check the type of the device?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's calling the device.GetName, so changed this logging to GPU device name.

Signed-off-by: Gyuho Lee <[email protected]>

// now that nvidia-smi installed,
// check the NVIDIA GPU presence via PCI bus
pciDevices, err := ListPCIs(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is only useful for printing?

Copy link
Contributor Author

@gyuho gyuho Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to ListNVIDIAPCIs. It's only listing the devices with NVIDIA name in it, thus checking whether the host has NVIDIA devices (we check len(results) > 0)

Signed-off-by: Gyuho Lee <[email protected]>
@gyuho gyuho merged commit fa8a50a into main Oct 25, 2024
5 checks passed
@gyuho gyuho deleted the nvidia-smi branch October 25, 2024 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants