Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to support vGPU? #87

Open
Fruneng opened this issue Jan 14, 2025 · 7 comments
Open

Is it possible to support vGPU? #87

Fruneng opened this issue Jan 14, 2025 · 7 comments

Comments

@Fruneng
Copy link

Fruneng commented Jan 14, 2025

like https://github.com/Project-HAMi/HAMi-core

@kevmo314
Copy link
Owner

Yes, support for the vGPU API should be possible, however unfortunately we don't actually have any GPUs that support it to develop and test with. If you have one, I believe the nvml API needs to be annotated correctly: https://docs.nvidia.com/deploy/nvml-api/group__nvmlVirtualGpuQueries.html

The annotations can be found here: https://github.com/kevmo314/scuda/blob/main/codegen/annotations.h

I don't know of a good test case for vGPU's though, ideally a very minimal binary that runs through the APIs would make verification easier.

@Fruneng
Copy link
Author

Fruneng commented Jan 15, 2025

Do you have any more complex cases that can run? Currently, I can only execute the simplest nvidia-smi command.

build image

docker build . -f Dockerfile.build -t scuda-builder-12.6.0 \
            --build-arg CUDA_VERSION=12.6.0 \
            --build-arg DISTRO_VERSION=22.04 \
            --build-arg OS_DISTRO=ubuntu \
            --build-arg CUDNN_TAG=cudnn

create docker network

docker network create scuda

start server

docker run -it --rm --gpus=all -p 14833:14833  --name scuda-server --network scuda  scuda-builder-12.6.0  /bin/bash -c "./local.sh server"

start client

docker run -it --rm --name scuda-client --network scuda  scuda-builder-12.6.0  /bin/bash 

test nvidia-smi

docker cp $(which nvidia-smi) scuda-client:/home/nvidia-smi

docker exec -it scuda-client /bin/bash -c "SCUDA_SERVER=scuda-server LD_PRELOAD=./libscuda_12.6.so ./nvidia-smi"

>Segfault handler installed.
>Wed Jan 15 01:48:43 2025
>+-----------------------------------------------------------------------------------------+
>| NVIDIA-SMI 560.27                 Driver Version: 560.70         CUDA Version: 12.6     |
>|-----------------------------------------+------------------------+----------------------+
>| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
>| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
>|                                         |                        |               MIG M. |
>|=========================================+========================+======================|
>|   0  Quadro P2000                   On  |   00000000:01:00.0 Off |                  N/A |
>| 44%   29C    P8              5W /   75W | Uninitialized          |      0%      Default |
>|                                         |                        |                  N/A |
>+-----------------------------------------+------------------------+----------------------+
>
>+-----------------------------------------------------------------------------------------+
>| Processes:                                                                              |
>|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
>|        ID   ID                                                               Usage      |
>|=========================================================================================|
>|  No running processes found                                                             |
>+-----------------------------------------------------------------------------------------+

test cuda api (aborted)

(base) ➜  ~ docker exec -it scuda-client /bin/bash -c "nvcc test/cublas_unified.cu -g -o cublas_unified -lcublas -L/usr/
local/cuda/lib64"
(base) ➜  ~ docker exec -it scuda-client /bin/bash -c "SCUDA_SERVER=scuda-server LD_PRELOAD=./libscuda_12.6.so cuda-gdb ./cublas_unified"
NVIDIA (R) cuda-gdb 12.6
Portions Copyright (C) 2007-2024 NVIDIA Corporation
Based on GNU gdb 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This CUDA-GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://forums.developer.nvidia.com/c/developer-tools/cuda-developer-tools/cuda-gdb>.
Find the CUDA-GDB manual and other documentation resources online at:
    <https://docs.nvidia.com/cuda/cuda-gdb/index.html>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./cublas_unified...
(cuda-gdb) run
Starting program: /home/cublas_unified
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGFPE, Arithmetic exception.
0x00007ffff7eb8d83 in std::__detail::_Mod_range_hashing::operator()(unsigned long, unsigned long) const ()
   from ./libscuda_12.6.so
(cuda-gdb) bt
#0  0x00007ffff7eb8d83 in std::__detail::_Mod_range_hashing::operator()(unsigned long, unsigned long) const ()
   from ./libscuda_12.6.so
#1  0x00007ffff7f85170 in std::__detail::_Hash_code_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, void*>, std::__detail::_Select1st, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, true>::_M_bucket_index(unsigned long, unsigned long) const () from ./libscuda_12.6.so
#2  0x00007ffff7f84e81 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, void*>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, void*> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_bucket_index(unsigned long) const () from ./libscuda_12.6.so
#3  0x00007ffff7f84b87 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, void*>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, void*> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::find(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
   from ./libscuda_12.6.so
#4  0x00007ffff7f8450f in std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void*, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, void*> > >::find(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from ./libscuda_12.6.so
#5  0x00007ffff7f74c26 in get_function_pointer(char const*) () from ./libscuda_12.6.so
#6  0x00007ffff7eb8b94 in dlsym () from ./libscuda_12.6.so
--Type <RET> for more, q to quit, c to continue without paging--
#7  0x00007fffd267c356 in ?? () from /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.12
#8  0x00007ffff7fc947e in ?? () from /lib64/ld-linux-x86-64.so.2
#9  0x00007ffff7fc9568 in ?? () from /lib64/ld-linux-x86-64.so.2
#10 0x00007ffff7fe32ca in ?? () from /lib64/ld-linux-x86-64.so.2
#11 0x0000000000000001 in ?? ()
#12 0x00007fffffffe3ca in ?? ()
#13 0x0000000000000000 in ?? ()

@kevmo314
Copy link
Owner

You can find our test suite here which covers the cases that currently work and we've verified that they work: https://github.com/kevmo314/scuda/blob/main/local.sh#L24

We are still working through all the APIs though, admittedly this repo gained visibility much faster than we have been able to wire them all up together :)

Most of the APIs only require some tweaks in the annotations file, although getting used to knowing which tweaks need to be made is a bit of an art right now. Some improved debugging tools are also on the roadmap.

@Fruneng
Copy link
Author

Fruneng commented Jan 16, 2025

Yes, support for the vGPU API should be possible, however unfortunately we don't actually have any GPUs that support it to develop and test with. If you have one, I believe the nvml API needs to be annotated correctly: https://docs.nvidia.com/deploy/nvml-api/group__nvmlVirtualGpuQueries.html

@kevmo314 What I'm referring to with vGPU is not the NVIDIA official MIG device. It's a technology that similarly use the Linux PRELOAD for its implementation. This technology is realized by the project found at Project-HAMi/HAMi-core. Also, it's very useful for GPU pooling in data centers.

HAMi-core usercase:

export LD_PRELOAD=./libvgpu.so
export CUDA_DEVICE_MEMORY_LIMIT=1g
export CUDA_DEVICE_SM_LIMIT=50

nvidia-smi
>| 44%   29C    P8              5W /   75W|      0 MiB /   1024 MiB |      0%      Default |

@silenceli
Copy link

@Fruneng 我也有这方面的需求,我在考虑如何将 Scuda 与 HAMi-core 进行集成,使得 gpu 具备池化的能力。

I also have needs in this regard. I am thinking about how to integrate Scuda with HAMi-core so that the gpu has pooling capabilities.

@Fruneng
Copy link
Author

Fruneng commented Jan 16, 2025

@silenceli 太好了 我们可以讨论一下如何实现

@silenceli
Copy link

@silenceli 太好了 我们可以讨论一下如何实现

可以加个微信聊一聊 :-) ,微信名:silenceli_1988

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants