You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the driver and nfd were installed in advance, I am trying to install gpu-operator in the A100 environment, but the installation of gpu-operator failed and the mig-manager was missing.
GPU: root@master1:~# lspci | grep NVIDIA 2f:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 40GB] (rev a1)
The gpu operaotr pod info:
There is no nvidia-mig-manager pod.
And the error pod logs as follows:
root@master1:~# kubectl logs -n gpu-operator nvidia-device-plugin-daemonset-gjlp7
...
I0109 12:33:24.644789 1 main.go:256] Retreiving plugins.
I0109 12:33:24.645553 1 factory.go:107] Detected NVML platform: found NVML library
I0109 12:33:24.645594 1 factory.go:107] Detected non-Tegra platform: /sys/devices/soc0/family file not found
E0109 12:33:24.699911 1 main.go:123] error starting plugins: error getting plugins: failed to construct NVML resource managers: error building device map: error building device map from config.resources: invalid MIG configuration: At least one device with migEnabled=true was not configured correctly: error visiting device: device 0 has an invalid MIG configuration
3. Steps to reproduce the issue
Install k8s cluster;
Install nfd:
root@master1:~# kubectl get pod -n node-feature-discovery
NAME READY STATUS RESTARTS AGE
nfd-release-node-feature-discovery-master-5564946bcf-x6qzs 1/1 Running 13 (49m ago) 43d
nfd-release-node-feature-discovery-worker-x7nff 1/1 Running 11 (49m ago) 43d
1. Quick Debug Information
2. Issue or feature description
When the driver and nfd were installed in advance, I am trying to install gpu-operator in the A100 environment, but the installation of gpu-operator failed and the mig-manager was missing.
GPU:
root@master1:~# lspci | grep NVIDIA 2f:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 40GB] (rev a1)
The gpu operaotr pod info:
There is no nvidia-mig-manager pod.
And the error pod logs as follows:
3. Steps to reproduce the issue
helm install gpu-operator -n gpu-operator --create-namespace ./gpu-operator --set driver.enabled=false --set nfd.enabled=false
The text was updated successfully, but these errors were encountered: