Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvidia-container-cli: mount error: failed to add device rules: unable to generate new device filter program from existing programs: unable to create new device filters program: load program: invalid argument: 0: (69) r2 = *(u16 *)(r1 +0) #256

Open
cosmic-heart opened this issue Mar 13, 2024 · 0 comments

Comments

@cosmic-heart
Copy link

cosmic-heart commented Mar 13, 2024

Error:
Starting the docker container without gpus

$ docker run -it ubuntu bash
root@cf3980a365ad:/# ls
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

i couldn't start the container attached with gpus. I am getting the below error.

$ sudo docker run --rm --gpus all hello-world nvidia-smi
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: failed to add device rules: unable to generate new device filter program from existing programs: unable to create new device filters program: load program: invalid argument: 0: (69) r2 = *(u16 *)(r1 +0)
1: (61) r3 = *(u32 *)(r1 +0)
2: (74) w3 >>= 16
3: (61) r4 = *(u32 *)(r1 +4)
4: (61) r5 = *(u32 *)(r1 +8)
5: (55) if r2 != 0x2 goto pc+7
 R1=ctx(id=0,off=0,imm=0) R2_w=inv2 R3_w=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)) R4_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R5_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
6: (bc) w6 = w3
7: (54) w6 &= 6
8: (5d) if r6 != r3 goto pc+4
 R1=ctx(id=0,off=0,imm=0) R2_w=inv2 R3_w=inv(id=0,umax_value=6,var_off=(0x0; 0x6)) R4_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R5_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6_w=inv(id=0,umax_value=6,var_off=(0x0; 0x6)) R10=fp0
9: (55) if r4 != 0xc3 goto pc+3
 R1=ctx(id=0,off=0,imm=0) R2=inv2 R3=inv(id=0,umax_value=6,var_off=(0x0; 0x6)) R4=inv195 R5=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6=inv(id=0,umax_value=6,var_off=(0x0; 0x6)) R10=fp0
10: (55) if r5 != 0xff goto pc+2

from 10 to 13: R1=ctx(id=0,off=0,imm=0) R2=inv2 R3=inv(id=0,umax_value=6,var_off=(0x0; 0x6)) R4=inv195 R5=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6=inv(id=0,umax_value=6,var_off=(0x0; 0x6)) R10=fp0
13: (61) r2 = *(u32 *)(r1 +0)
14: (54) w2 &= 65535
15: (bc) w2 = w2
BPF_MOV uses reserved fields
processed 16 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1: unknown.

Version:

$ nvidia-container-toolkit --version
NVIDIA Container Runtime Hook version 1.14.6
commit: 5605d191332dcfeea802c4497360d60a65c7887e
$ neofetch
            .-/+oossssoo+/-.               $
        `:+ssssssssssssssssss+:`           -------------------------- 
      -+ssssssssssssssssssyyssss+-         OS: Ubuntu 22.04.4 LTS ppc64le 
    .ossssssssssssssssssdMMMNysssso.       Host: 8335-GTH 
   /ssssssssssshdmmNNmmyNMMMMhssssss/      Kernel: 5.15.0-100-generic 
  +ssssssssshmydMMMMMMMNddddyssssssss+     Uptime: 2 days 
 /sssssssshNMMMyhhyyyyhmNMMMNhssssssss/    Packages: 830 (dpkg) 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Shell: bash 5.1.16 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   Resolution: 1024x768 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   Terminal: /dev/pts/1 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   CPU: POWER9 (128) @ 3.800GHz 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   GPU: NVIDIA Tesla V100 SXM2 32GB 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   GPU: NVIDIA Tesla V100 SXM2 32GB 
 /sssssssshNMMMyhhyyyyhdNMMMNhssssssss/    GPU: NVIDIA Tesla V100 SXM2 32GB 
  +sssssssssdmydMMMMMMMMddddyssssssss+     GPU: NVIDIA Tesla V100 SXM2 32GB 
   /ssssssssssshdmNNNNmyNMMMMhssssss/      Memory: 4582MiB / 261504MiB 
    .ossssssssssssssssssdMMMNysssso.
      -+sssssssssssssssssyyyssss+-                                 
        `:+ssssssssssssssssss+:`                                   
            .-/+oossssoo+/-.

$ 
Container Toolkit logs $ nvidia-container-cli -k -d /dev/tty info

-- WARNING, the following logs are for debugging purposes only --

I0313 06:14:47.315977 150873 nvc.c:393] initializing library context (version=1.14.6, build=d2eb0afe86f0b643e33624ee64f065dd60e952d4)
I0313 06:14:47.316021 150873 nvc.c:364] using root /
I0313 06:14:47.316030 150873 nvc.c:365] using ldcache /etc/ld.so.cache
I0313 06:14:47.316038 150873 nvc.c:366] using unprivileged user 1000:1000
I0313 06:14:47.316062 150873 nvc.c:410] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0313 06:14:47.316497 150873 nvc.c:412] dxcore initialization failed, continuing assuming a non-WSL environment
W0313 06:14:47.319821 150874 nvc.c:273] failed to set inheritable capabilities
W0313 06:14:47.319847 150874 nvc.c:274] skipping kernel modules load due to failure
I0313 06:14:47.320117 150875 rpc.c:71] starting driver rpc service
I0313 06:14:47.330664 150876 rpc.c:71] starting nvcgo rpc service
I0313 06:14:47.332329 150873 nvc_info.c:797] requesting driver information with ''
I0313 06:14:47.337077 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/vdpau/libvdpau_nvidia.so.525.147.05
I0313 06:14:47.337236 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libnvidia-tls.so.525.147.05
I0313 06:14:47.337292 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libnvidia-ptxjitcompiler.so.525.147.05
I0313 06:14:47.337382 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libnvidia-opticalflow.so.525.147.05
I0313 06:14:47.337470 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libnvidia-opencl.so.525.147.05
I0313 06:14:47.337529 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libnvidia-nvvm.so.525.147.05
I0313 06:14:47.337617 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libnvidia-ml.so.525.147.05
I0313 06:14:47.337704 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libnvidia-glvkspirv.so.525.147.05
I0313 06:14:47.337754 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libnvidia-glsi.so.525.147.05
I0313 06:14:47.337804 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libnvidia-glcore.so.525.147.05
I0313 06:14:47.337857 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libnvidia-encode.so.525.147.05
I0313 06:14:47.337941 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libnvidia-eglcore.so.525.147.05
I0313 06:14:47.337998 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libnvidia-cfg.so.525.147.05
I0313 06:14:47.338086 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libnvidia-allocator.so.525.147.05
I0313 06:14:47.338174 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libnvcuvid.so.525.147.05
I0313 06:14:47.338417 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libcudadebugger.so.525.147.05
I0313 06:14:47.338470 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libcuda.so.525.147.05
I0313 06:14:47.338621 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libGLX_nvidia.so.525.147.05
I0313 06:14:47.338675 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libGLESv2_nvidia.so.525.147.05
I0313 06:14:47.338728 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libGLESv1_CM_nvidia.so.525.147.05
I0313 06:14:47.338783 150873 nvc_info.c:175] selecting /usr/lib/powerpc64le-linux-gnu/libEGL_nvidia.so.525.147.05
W0313 06:14:47.338807 150873 nvc_info.c:401] missing library libnvidia-nscq.so
W0313 06:14:47.338815 150873 nvc_info.c:401] missing library libnvidia-gpucomp.so
W0313 06:14:47.338823 150873 nvc_info.c:401] missing library libnvidia-fatbinaryloader.so
W0313 06:14:47.338830 150873 nvc_info.c:401] missing library libnvidia-compiler.so
W0313 06:14:47.338838 150873 nvc_info.c:401] missing library libnvidia-pkcs11.so
W0313 06:14:47.338846 150873 nvc_info.c:401] missing library libnvidia-pkcs11-openssl3.so
W0313 06:14:47.338854 150873 nvc_info.c:401] missing library libnvidia-ngx.so
W0313 06:14:47.338861 150873 nvc_info.c:401] missing library libnvidia-fbc.so
W0313 06:14:47.338869 150873 nvc_info.c:401] missing library libnvidia-ifr.so
W0313 06:14:47.338876 150873 nvc_info.c:401] missing library libnvidia-rtcore.so
W0313 06:14:47.338884 150873 nvc_info.c:401] missing library libnvoptix.so
W0313 06:14:47.338892 150873 nvc_info.c:401] missing library libnvidia-cbl.so
W0313 06:14:47.338899 150873 nvc_info.c:405] missing compat32 library libnvidia-ml.so
W0313 06:14:47.338907 150873 nvc_info.c:405] missing compat32 library libnvidia-cfg.so
W0313 06:14:47.338915 150873 nvc_info.c:405] missing compat32 library libnvidia-nscq.so
W0313 06:14:47.338922 150873 nvc_info.c:405] missing compat32 library libcuda.so
W0313 06:14:47.338931 150873 nvc_info.c:405] missing compat32 library libcudadebugger.so
W0313 06:14:47.338938 150873 nvc_info.c:405] missing compat32 library libnvidia-opencl.so
W0313 06:14:47.338946 150873 nvc_info.c:405] missing compat32 library libnvidia-gpucomp.so
W0313 06:14:47.338954 150873 nvc_info.c:405] missing compat32 library libnvidia-ptxjitcompiler.so
W0313 06:14:47.338961 150873 nvc_info.c:405] missing compat32 library libnvidia-fatbinaryloader.so
W0313 06:14:47.338969 150873 nvc_info.c:405] missing compat32 library libnvidia-allocator.so
W0313 06:14:47.338977 150873 nvc_info.c:405] missing compat32 library libnvidia-compiler.so
W0313 06:14:47.338984 150873 nvc_info.c:405] missing compat32 library libnvidia-pkcs11.so
W0313 06:14:47.338992 150873 nvc_info.c:405] missing compat32 library libnvidia-pkcs11-openssl3.so
W0313 06:14:47.339000 150873 nvc_info.c:405] missing compat32 library libnvidia-nvvm.so
W0313 06:14:47.339007 150873 nvc_info.c:405] missing compat32 library libnvidia-ngx.so
W0313 06:14:47.339015 150873 nvc_info.c:405] missing compat32 library libvdpau_nvidia.so
W0313 06:14:47.339023 150873 nvc_info.c:405] missing compat32 library libnvidia-encode.so
W0313 06:14:47.339030 150873 nvc_info.c:405] missing compat32 library libnvidia-opticalflow.so
W0313 06:14:47.339038 150873 nvc_info.c:405] missing compat32 library libnvcuvid.so
W0313 06:14:47.339045 150873 nvc_info.c:405] missing compat32 library libnvidia-eglcore.so
W0313 06:14:47.339053 150873 nvc_info.c:405] missing compat32 library libnvidia-glcore.so
W0313 06:14:47.339061 150873 nvc_info.c:405] missing compat32 library libnvidia-tls.so
W0313 06:14:47.339069 150873 nvc_info.c:405] missing compat32 library libnvidia-glsi.so
W0313 06:14:47.339076 150873 nvc_info.c:405] missing compat32 library libnvidia-fbc.so
W0313 06:14:47.339084 150873 nvc_info.c:405] missing compat32 library libnvidia-ifr.so
W0313 06:14:47.339091 150873 nvc_info.c:405] missing compat32 library libnvidia-rtcore.so
W0313 06:14:47.339099 150873 nvc_info.c:405] missing compat32 library libnvoptix.so
W0313 06:14:47.339107 150873 nvc_info.c:405] missing compat32 library libGLX_nvidia.so
W0313 06:14:47.339114 150873 nvc_info.c:405] missing compat32 library libEGL_nvidia.so
W0313 06:14:47.339122 150873 nvc_info.c:405] missing compat32 library libGLESv2_nvidia.so
W0313 06:14:47.339129 150873 nvc_info.c:405] missing compat32 library libGLESv1_CM_nvidia.so
W0313 06:14:47.339137 150873 nvc_info.c:405] missing compat32 library libnvidia-glvkspirv.so
W0313 06:14:47.339145 150873 nvc_info.c:405] missing compat32 library libnvidia-cbl.so
I0313 06:14:47.339699 150873 nvc_info.c:301] selecting /usr/bin/nvidia-smi
I0313 06:14:47.339737 150873 nvc_info.c:301] selecting /usr/bin/nvidia-debugdump
I0313 06:14:47.339774 150873 nvc_info.c:301] selecting /usr/bin/nvidia-persistenced
I0313 06:14:47.339836 150873 nvc_info.c:301] selecting /usr/bin/nvidia-cuda-mps-control
I0313 06:14:47.339872 150873 nvc_info.c:301] selecting /usr/bin/nvidia-cuda-mps-server
W0313 06:14:47.339991 150873 nvc_info.c:427] missing binary nv-fabricmanager
W0313 06:14:47.340041 150873 nvc_info.c:470] missing firmware path /usr/lib/firmware/nvidia/525.147.05/gsp*.bin
I0313 06:14:47.340089 150873 nvc_info.c:560] listing device /dev/nvidiactl
I0313 06:14:47.340097 150873 nvc_info.c:560] listing device /dev/nvidia-uvm
I0313 06:14:47.340105 150873 nvc_info.c:560] listing device /dev/nvidia-uvm-tools
I0313 06:14:47.340112 150873 nvc_info.c:560] listing device /dev/nvidia-modeset
I0313 06:14:47.340163 150873 nvc_info.c:345] listing ipc path /run/nvidia-persistenced/socket
W0313 06:14:47.340206 150873 nvc_info.c:351] missing ipc path /var/run/nvidia-fabricmanager/socket
W0313 06:14:47.340234 150873 nvc_info.c:351] missing ipc path /tmp/nvidia-mps
I0313 06:14:47.340242 150873 nvc_info.c:853] requesting device information with ''
I0313 06:14:47.346788 150873 nvc_info.c:744] listing device /dev/nvidia0 (GPU-f623f840-c256-53dc-126e-a8b923d86baf at 00000004:04:00.0)
I0313 06:14:47.353096 150873 nvc_info.c:744] listing device /dev/nvidia1 (GPU-d2b33ef8-d177-a4f0-01a1-121e76fb1309 at 00000004:05:00.0)
I0313 06:14:47.359434 150873 nvc_info.c:744] listing device /dev/nvidia2 (GPU-34035253-6813-3cc8-525e-3b20791092a5 at 00000035:03:00.0)
I0313 06:14:47.365677 150873 nvc_info.c:744] listing device /dev/nvidia3 (GPU-0c9646fc-b3c1-78aa-800a-378e333cd1cf at 00000035:04:00.0)
NVRM version: 525.147.05
CUDA version: 12.0

Device Index: 0
Device Minor: 0
Model: Tesla V100-SXM2-32GB
Brand: Tesla
GPU UUID: GPU-f623f840-c256-53dc-126e-a8b923d86baf
Bus Location: 00000004:04:00.0
Architecture: 7.0

Device Index: 1
Device Minor: 1
Model: Tesla V100-SXM2-32GB
Brand: Tesla
GPU UUID: GPU-d2b33ef8-d177-a4f0-01a1-121e76fb1309
Bus Location: 00000004:05:00.0
Architecture: 7.0

Device Index: 2
Device Minor: 2
Model: Tesla V100-SXM2-32GB
Brand: Tesla
GPU UUID: GPU-34035253-6813-3cc8-525e-3b20791092a5
Bus Location: 00000035:03:00.0
Architecture: 7.0

Device Index: 3
Device Minor: 3
Model: Tesla V100-SXM2-32GB
Brand: Tesla
GPU UUID: GPU-0c9646fc-b3c1-78aa-800a-378e333cd1cf
Bus Location: 00000035:04:00.0
Architecture: 7.0
I0313 06:14:47.365760 150873 nvc.c:452] shutting down library context
I0313 06:14:47.365792 150876 rpc.c:95] terminating nvcgo rpc service
I0313 06:14:47.367251 150873 rpc.c:135] nvcgo rpc service terminated successfully
I0313 06:14:47.368648 150875 rpc.c:95] terminating driver rpc service
I0313 06:14:47.368775 150873 rpc.c:135] driver rpc service terminated successfully

nvidia-smi logs

==============NVSMI LOG==============

Timestamp : Wed Mar 13 06:21:15 2024
Driver Version : 525.147.05
CUDA Version : 12.0

Attached GPUs : 4
GPU 00000004:04:00.0
Product Name : Tesla V100-SXM2-32GB
Product Brand : Tesla
Product Architecture : Volta
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0324118134526
GPU UUID : GPU-f623f840-c256-53dc-126e-a8b923d86baf
Minor Number : 0
VBIOS Version : 88.00.43.00.03
MultiGPU Board : No
Board ID : 0x40400
Board Part Number : 900-2G503-0430-000
GPU Part Number : 1DB5-896-A1
Module ID : 1
Inforom Version
Image Version : G503.0203.00.04
OEM Object : 1.1
ECC Object : 5.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x04
Device : 0x00
Domain : 0x0004
Device Id : 0x1DB510DE
Bus Id : 00000004:04:00.0
Sub System Id : 0x124910DE
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Device Current : 3
Device Max : 3
Host Max : 4
Link Width
Max : 16x
Current : 2x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 32768 MiB
Reserved : 267 MiB
Used : 0 MiB
Free : 32500 MiB
BAR1 Memory Usage
Total : 32768 MiB
Used : 2 MiB
Free : 32766 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : 0
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending Page Blacklist : No
Remapped Rows : N/A
Temperature
GPU Current Temp : 24 C
GPU T.Limit Temp : N/A
GPU Shutdown Temp : 90 C
GPU Slowdown Temp : 87 C
GPU Max Operating Temp : 83 C
GPU Target Temperature : N/A
Memory Current Temp : 24 C
Memory Max Operating Temp : 85 C
Power Readings
Power Management : Supported
Power Draw : 36.38 W
Power Limit : 300.00 W
Default Power Limit : 300.00 W
Enforced Power Limit : 300.00 W
Min Power Limit : 150.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 135 MHz
SM : 135 MHz
Memory : 877 MHz
Video : 555 MHz
Applications Clocks
Graphics : 1290 MHz
Memory : 877 MHz
Default Applications Clocks
Graphics : 1290 MHz
Memory : 877 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 1530 MHz
SM : 1530 MHz
Memory : 877 MHz
Video : 1372 MHz
Max Customer Boost Clocks
Graphics : 1530 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Fabric
State : N/A
Status : N/A
Processes : None

GPU 00000004:05:00.0
Product Name : Tesla V100-SXM2-32GB
Product Brand : Tesla
Product Architecture : Volta
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0324118133463
GPU UUID : GPU-d2b33ef8-d177-a4f0-01a1-121e76fb1309
Minor Number : 1
VBIOS Version : 88.00.43.00.03
MultiGPU Board : No
Board ID : 0x40500
Board Part Number : 900-2G503-0430-000
GPU Part Number : 1DB5-896-A1
Module ID : 2
Inforom Version
Image Version : G503.0203.00.04
OEM Object : 1.1
ECC Object : 5.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x05
Device : 0x00
Domain : 0x0004
Device Id : 0x1DB510DE
Bus Id : 00000004:05:00.0
Sub System Id : 0x124910DE
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Device Current : 3
Device Max : 3
Host Max : 4
Link Width
Max : 16x
Current : 2x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 32768 MiB
Reserved : 267 MiB
Used : 0 MiB
Free : 32500 MiB
BAR1 Memory Usage
Total : 32768 MiB
Used : 2 MiB
Free : 32766 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : 0
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending Page Blacklist : No
Remapped Rows : N/A
Temperature
GPU Current Temp : 27 C
GPU T.Limit Temp : N/A
GPU Shutdown Temp : 90 C
GPU Slowdown Temp : 87 C
GPU Max Operating Temp : 83 C
GPU Target Temperature : N/A
Memory Current Temp : 28 C
Memory Max Operating Temp : 85 C
Power Readings
Power Management : Supported
Power Draw : 38.34 W
Power Limit : 300.00 W
Default Power Limit : 300.00 W
Enforced Power Limit : 300.00 W
Min Power Limit : 150.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 135 MHz
SM : 135 MHz
Memory : 877 MHz
Video : 555 MHz
Applications Clocks
Graphics : 1290 MHz
Memory : 877 MHz
Default Applications Clocks
Graphics : 1290 MHz
Memory : 877 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 1530 MHz
SM : 1530 MHz
Memory : 877 MHz
Video : 1372 MHz
Max Customer Boost Clocks
Graphics : 1530 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Fabric
State : N/A
Status : N/A
Processes : None

GPU 00000035:03:00.0
Product Name : Tesla V100-SXM2-32GB
Product Brand : Tesla
Product Architecture : Volta
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0324118134598
GPU UUID : GPU-34035253-6813-3cc8-525e-3b20791092a5
Minor Number : 2
VBIOS Version : 88.00.43.00.03
MultiGPU Board : No
Board ID : 0x350300
Board Part Number : 900-2G503-0430-000
GPU Part Number : 1DB5-896-A1
Module ID : 1
Inforom Version
Image Version : G503.0203.00.04
OEM Object : 1.1
ECC Object : 5.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x03
Device : 0x00
Domain : 0x0035
Device Id : 0x1DB510DE
Bus Id : 00000035:03:00.0
Sub System Id : 0x124910DE
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Device Current : 3
Device Max : 3
Host Max : 4
Link Width
Max : 16x
Current : 2x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 32768 MiB
Reserved : 267 MiB
Used : 0 MiB
Free : 32500 MiB
BAR1 Memory Usage
Total : 32768 MiB
Used : 2 MiB
Free : 32766 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : 0
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending Page Blacklist : No
Remapped Rows : N/A
Temperature
GPU Current Temp : 24 C
GPU T.Limit Temp : N/A
GPU Shutdown Temp : 90 C
GPU Slowdown Temp : 87 C
GPU Max Operating Temp : 83 C
GPU Target Temperature : N/A
Memory Current Temp : 23 C
Memory Max Operating Temp : 85 C
Power Readings
Power Management : Supported
Power Draw : 38.36 W
Power Limit : 300.00 W
Default Power Limit : 300.00 W
Enforced Power Limit : 300.00 W
Min Power Limit : 150.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 135 MHz
SM : 135 MHz
Memory : 877 MHz
Video : 555 MHz
Applications Clocks
Graphics : 1290 MHz
Memory : 877 MHz
Default Applications Clocks
Graphics : 1290 MHz
Memory : 877 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 1530 MHz
SM : 1530 MHz
Memory : 877 MHz
Video : 1372 MHz
Max Customer Boost Clocks
Graphics : 1530 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Fabric
State : N/A
Status : N/A
Processes : None

GPU 00000035:04:00.0
Product Name : Tesla V100-SXM2-32GB
Product Brand : Tesla
Product Architecture : Volta
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0324318126859
GPU UUID : GPU-0c9646fc-b3c1-78aa-800a-378e333cd1cf
Minor Number : 3
VBIOS Version : 88.00.43.00.03
MultiGPU Board : No
Board ID : 0x350400
Board Part Number : 900-2G503-0430-000
GPU Part Number : 1DB5-896-A1
Module ID : 2
Inforom Version
Image Version : G503.0203.00.04
OEM Object : 1.1
ECC Object : 5.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x04
Device : 0x00
Domain : 0x0035
Device Id : 0x1DB510DE
Bus Id : 00000035:04:00.0
Sub System Id : 0x124910DE
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Device Current : 3
Device Max : 3
Host Max : 4
Link Width
Max : 16x
Current : 2x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 32768 MiB
Reserved : 267 MiB
Used : 0 MiB
Free : 32500 MiB
BAR1 Memory Usage
Total : 32768 MiB
Used : 2 MiB
Free : 32766 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : 0
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending Page Blacklist : No
Remapped Rows : N/A
Temperature
GPU Current Temp : 29 C
GPU T.Limit Temp : N/A
GPU Shutdown Temp : 90 C
GPU Slowdown Temp : 87 C
GPU Max Operating Temp : 83 C
GPU Target Temperature : N/A
Memory Current Temp : 24 C
Memory Max Operating Temp : 85 C
Power Readings
Power Management : Supported
Power Draw : 38.39 W
Power Limit : 300.00 W
Default Power Limit : 300.00 W
Enforced Power Limit : 300.00 W
Min Power Limit : 150.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 135 MHz
SM : 135 MHz
Memory : 877 MHz
Video : 555 MHz
Applications Clocks
Graphics : 1290 MHz
Memory : 877 MHz
Default Applications Clocks
Graphics : 1290 MHz
Memory : 877 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 1530 MHz
SM : 1530 MHz
Memory : 877 MHz
Video : 1372 MHz
Max Customer Boost Clocks
Graphics : 1530 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Fabric
State : N/A
Status : N/A
Processes : None

Thank you. Let me know if any more details are required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant