-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GLM-4v Multimodal Model support for SGLang #1641
base: main
Are you sure you want to change the base?
Conversation
Wow, that's cool. Thank you and Zhipu AI for your contribution! |
Thanks for the contribution.
|
When executing this test file, an error will occur.
Do you have any solution? |
Can you share your command and more traceback? I can run it successfully on an H100.
|
It may be a problem with model registration, which leads to infinite recursion and then an error after exceeding the video memory. Where should I modify the model registration? Is my
|
Your usage seems good.
|
The previous problem has been solved, but when I execute test_vision_openai_server.py, an error occurs
|
It seems the model did not see the image and started to hallucinate. Did you pass in the images correctly? |
Where does sglang receive and process multimodal input? |
You can see the llava for example sglang/python/sglang/srt/models/llava.py Lines 156 to 167 in d19cc0b
and run the https://github.com/sgl-project/sglang/blob/main/test/srt/test_vision_openai_server.py to understand the code path. You can also see some related PRs: #1551 #1546 |
What is the minimum example of running a multimodal model, receiving a prompt and an image, and then performing inference? |
Hi @sixsixcoder The code for Qwen2 VL has already been merged into the main branch, where the triton-related kernel can be reused in GLM 4V, which is more efficient than the torch implementation and was completed by @ispobock . You may consider replacing and using it in this PR. Thanks! |
@sixsixcoder please rebase and add the test for GLM-4v. Thanks! |
Motivation
Add GLM-4v support for SGLang, GLM-4v is a widely used multimodal model developed by THUDM, we hope to adapt to the excellent fast serving framework SGLang
Modifications
chatglm.py
file from vllmglm4 vision encoder
inpython/sglang/srt/models/glm4_vision_encoder.py
.Checklist