Replies: 1 comment 2 replies
-
Dynamic GPU resource scheduling mentioned in the discussion is not supported BentoML. BentoML currently let's user to directly control GPU device scheduling. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The integration of gRPC communication in BentoML improves the efficiency of tensor transmission after microservices deployment. However, I recently encountered an issue where dynamic GPU resource scheduling cannot be performed after containerized deployment. The current environment consists of four GPUs, each deploying a model, and each model has added business logic processing and been transformed into a microservice. However, the model loaded on the third GPU is too large, causing
CUDA out-of-memory
during GPU computation. Meanwhile, the first GPU still has ample resources available. Therefore, I had to modify the code to catch exceptions and control the allocation of this business computation todevice=cuda:0
through the code, which puts us in a very passive state. However, there is no mention of dynamic GPU resource scheduling strategies in the official documentation.Can a dynamic scheduling algorithm be integrated? It should include parameters such as: whether to enable dynamic resource scheduling, business concurrency, initial data size (such as audio, text length, etc.), and model tensor parsing size, etc. Currently, I am working on implementing this in my project. Is this already implemented in BentoML(
version>=1.3.3
)?Beta Was this translation helpful? Give feedback.
All reactions