Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] Vison support. #229

Open
3 tasks done
Ph0rk0z opened this issue Oct 29, 2024 · 1 comment
Open
3 tasks done

[REQUEST] Vison support. #229

Ph0rk0z opened this issue Oct 29, 2024 · 1 comment
Labels
exl2 issue Exl2 issue, may be fixed in its dev branch

Comments

@Ph0rk0z
Copy link

Ph0rk0z commented Oct 29, 2024

Problem

Tabby API currently only handles text. Many vision models have released. Exllama dev supports qwen2-vl

Solution

Support vision through openAI api. Hopefully in text completion too.

Alternatives

No response

Explanation

More multi-modal models will be created over time. Would be cool to have a fully integrated experience. i.e Creating an image with a model and having it iteratively use the image gen tool after seeing what it got back.

Examples

No response

Additional context

No response

Acknowledgements

  • I have looked for similar requests before submitting this one.
  • I understand that the developers have lives and my issue will be answered when possible.
  • I understand the developers of this program are human, and I will make my requests politely.
@bdashore3 bdashore3 added the exl2 issue Exl2 issue, may be fixed in its dev branch label Oct 29, 2024
@bdashore3
Copy link
Member

bdashore3 commented Oct 29, 2024

I understand the excitement over vision models and would like to implement support once there's a proper pipeline on how to do so via exllamav2.

According to turbo, the current dev branch is experimental and only works with the image part of llava. There still needs to be support for Qwen-2 VL and other models which is most likely being worked on at this time:

It should work for Qwen2-VL as well, although that will require some updates to the RoPE since they have multidimensional positional embeddings for images. Even a time dimension for video, just to make it that much harder. :P

I'd keep an eye on this issue turboderp/exllamav2#658 for the time being as that's a blocking issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exl2 issue Exl2 issue, may be fixed in its dev branch
Projects
None yet
Development

No branches or pull requests

2 participants