Is it possible to support vGPU? #87

Fruneng · 2025-01-14T01:19:02Z

like https://github.com/Project-HAMi/HAMi-core

kevmo314 · 2025-01-14T07:58:24Z

Yes, support for the vGPU API should be possible, however unfortunately we don't actually have any GPUs that support it to develop and test with. If you have one, I believe the nvml API needs to be annotated correctly: https://docs.nvidia.com/deploy/nvml-api/group__nvmlVirtualGpuQueries.html

The annotations can be found here: https://github.com/kevmo314/scuda/blob/main/codegen/annotations.h

I don't know of a good test case for vGPU's though, ideally a very minimal binary that runs through the APIs would make verification easier.

Fruneng · 2025-01-15T01:58:21Z

Do you have any more complex cases that can run? Currently, I can only execute the simplest nvidia-smi command.

build image

docker build . -f Dockerfile.build -t scuda-builder-12.6.0 \
            --build-arg CUDA_VERSION=12.6.0 \
            --build-arg DISTRO_VERSION=22.04 \
            --build-arg OS_DISTRO=ubuntu \
            --build-arg CUDNN_TAG=cudnn

create docker network

docker network create scuda

start server

docker run -it --rm --gpus=all -p 14833:14833  --name scuda-server --network scuda  scuda-builder-12.6.0  /bin/bash -c "./local.sh server"

start client

docker run -it --rm --name scuda-client --network scuda  scuda-builder-12.6.0  /bin/bash

test nvidia-smi

docker cp $(which nvidia-smi) scuda-client:/home/nvidia-smi

docker exec -it scuda-client /bin/bash -c "SCUDA_SERVER=scuda-server LD_PRELOAD=./libscuda_12.6.so ./nvidia-smi"

>Segfault handler installed.
>Wed Jan 15 01:48:43 2025
>+-----------------------------------------------------------------------------------------+
>| NVIDIA-SMI 560.27                 Driver Version: 560.70         CUDA Version: 12.6     |
>|-----------------------------------------+------------------------+----------------------+
>| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
>| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
>|                                         |                        |               MIG M. |
>|=========================================+========================+======================|
>|   0  Quadro P2000                   On  |   00000000:01:00.0 Off |                  N/A |
>| 44%   29C    P8              5W /   75W | Uninitialized          |      0%      Default |
>|                                         |                        |                  N/A |
>+-----------------------------------------+------------------------+----------------------+
>
>+-----------------------------------------------------------------------------------------+
>| Processes:                                                                              |
>|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
>|        ID   ID                                                               Usage      |
>|=========================================================================================|
>|  No running processes found                                                             |
>+-----------------------------------------------------------------------------------------+

test cuda api (aborted)

(base) ➜  ~ docker exec -it scuda-client /bin/bash -c "nvcc test/cublas_unified.cu -g -o cublas_unified -lcublas -L/usr/
local/cuda/lib64"
(base) ➜  ~ docker exec -it scuda-client /bin/bash -c "SCUDA_SERVER=scuda-server LD_PRELOAD=./libscuda_12.6.so cuda-gdb ./cublas_unified"
NVIDIA (R) cuda-gdb 12.6
Portions Copyright (C) 2007-2024 NVIDIA Corporation
Based on GNU gdb 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This CUDA-GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://forums.developer.nvidia.com/c/developer-tools/cuda-developer-tools/cuda-gdb>.
Find the CUDA-GDB manual and other documentation resources online at:
    <https://docs.nvidia.com/cuda/cuda-gdb/index.html>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./cublas_unified...
(cuda-gdb) run
Starting program: /home/cublas_unified
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGFPE, Arithmetic exception.
0x00007ffff7eb8d83 in std::__detail::_Mod_range_hashing::operator()(unsigned long, unsigned long) const ()
   from ./libscuda_12.6.so
(cuda-gdb) bt
#0  0x00007ffff7eb8d83 in std::__detail::_Mod_range_hashing::operator()(unsigned long, unsigned long) const ()
   from ./libscuda_12.6.so
#1  0x00007ffff7f85170 in std::__detail::_Hash_code_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, void*>, std::__detail::_Select1st, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, true>::_M_bucket_index(unsigned long, unsigned long) const () from ./libscuda_12.6.so
#2  0x00007ffff7f84e81 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, void*>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, void*> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_bucket_index(unsigned long) const () from ./libscuda_12.6.so
#3  0x00007ffff7f84b87 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, void*>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, void*> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::find(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
   from ./libscuda_12.6.so
#4  0x00007ffff7f8450f in std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, void*> > >::find(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from ./libscuda_12.6.so
#5  0x00007ffff7f74c26 in get_function_pointer(char const*) () from ./libscuda_12.6.so
#6  0x00007ffff7eb8b94 in dlsym () from ./libscuda_12.6.so
--Type <RET> for more, q to quit, c to continue without paging--
#7  0x00007fffd267c356 in ?? () from /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.12
#8  0x00007ffff7fc947e in ?? () from /lib64/ld-linux-x86-64.so.2
#9  0x00007ffff7fc9568 in ?? () from /lib64/ld-linux-x86-64.so.2
#10 0x00007ffff7fe32ca in ?? () from /lib64/ld-linux-x86-64.so.2
#11 0x0000000000000001 in ?? ()
#12 0x00007fffffffe3ca in ?? ()
#13 0x0000000000000000 in ?? ()

kevmo314 · 2025-01-15T02:09:18Z

You can find our test suite here which covers the cases that currently work and we've verified that they work: https://github.com/kevmo314/scuda/blob/main/local.sh#L24

We are still working through all the APIs though, admittedly this repo gained visibility much faster than we have been able to wire them all up together :)

Most of the APIs only require some tweaks in the annotations file, although getting used to knowing which tweaks need to be made is a bit of an art right now. Some improved debugging tools are also on the roadmap.

Fruneng · 2025-01-16T02:04:54Z

Yes, support for the vGPU API should be possible, however unfortunately we don't actually have any GPUs that support it to develop and test with. If you have one, I believe the nvml API needs to be annotated correctly: https://docs.nvidia.com/deploy/nvml-api/group__nvmlVirtualGpuQueries.html

@kevmo314 What I'm referring to with vGPU is not the NVIDIA official MIG device. It's a technology that similarly use the Linux PRELOAD for its implementation. This technology is realized by the project found at Project-HAMi/HAMi-core. Also, it's very useful for GPU pooling in data centers.

HAMi-core usercase:

export LD_PRELOAD=./libvgpu.so
export CUDA_DEVICE_MEMORY_LIMIT=1g
export CUDA_DEVICE_SM_LIMIT=50

nvidia-smi
>| 44%   29C    P8              5W /   75W|      0 MiB /   1024 MiB |      0%      Default |

silenceli · 2025-01-16T07:34:46Z

@Fruneng 我也有这方面的需求，我在考虑如何将 Scuda 与 HAMi-core 进行集成，使得 gpu 具备池化的能力。

I also have needs in this regard. I am thinking about how to integrate Scuda with HAMi-core so that the gpu has pooling capabilities.

Fruneng · 2025-01-16T08:01:38Z

@silenceli 太好了我们可以讨论一下如何实现

silenceli · 2025-01-17T01:27:39Z

@silenceli 太好了我们可以讨论一下如何实现

可以加个微信聊一聊 :-) ，微信名：silenceli_1988

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to support vGPU? #87

Is it possible to support vGPU? #87

Fruneng commented Jan 14, 2025

kevmo314 commented Jan 14, 2025

Fruneng commented Jan 15, 2025

kevmo314 commented Jan 15, 2025

Fruneng commented Jan 16, 2025

silenceli commented Jan 16, 2025

Fruneng commented Jan 16, 2025

silenceli commented Jan 17, 2025

Is it possible to support vGPU? #87

Is it possible to support vGPU? #87

Comments

Fruneng commented Jan 14, 2025

kevmo314 commented Jan 14, 2025

Fruneng commented Jan 15, 2025

build image

create docker network

start server

start client

test nvidia-smi

test cuda api (aborted)

kevmo314 commented Jan 15, 2025

Fruneng commented Jan 16, 2025

silenceli commented Jan 16, 2025

Fruneng commented Jan 16, 2025

silenceli commented Jan 17, 2025