With Vllm its very prompt sensitive and taking longer than Hugging face #43

nithingovindugari · 2024-10-29T16:33:46Z

when using tensor parallel size = 4 with 4 A100's it is still very slow for 2 min video it is taking 150 + seconds ...what is the best way to do fastest inference for this model ? Is their any code examples where you use multiple gpu's and gets fast results for example using vllm.. I went through vllm demo in repo but that doesn't help.... What is the inference engine used in rhymes website to get that quick responses? Thank you in advance

nithingovindugari changed the title ~~With Vllm its very prompt sensitive and not taking longer than Hugging face~~ With Vllm its very prompt sensitive and taking longer than Hugging face Oct 29, 2024

xffxff added the vllm label Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

With Vllm its very prompt sensitive and taking longer than Hugging face #43

With Vllm its very prompt sensitive and taking longer than Hugging face #43

nithingovindugari commented Oct 29, 2024

With Vllm its very prompt sensitive and taking longer than Hugging face #43

With Vllm its very prompt sensitive and taking longer than Hugging face #43

Comments

nithingovindugari commented Oct 29, 2024