Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support image prompt #29

Open
wentao-uw opened this issue Aug 22, 2024 · 2 comments
Open

Support image prompt #29

wentao-uw opened this issue Aug 22, 2024 · 2 comments

Comments

@wentao-uw
Copy link

Use case: replace logo or text in the video. Input: old logo, video (with old logo), new logo; output: video with new logo

@rentainhe
Copy link
Collaborator

rentainhe commented Aug 23, 2024

Hi @wentao-uw , it's a good idea to support referring detection or segmentation based on image prompt, but Grounding DINO can only support text prompts now, for referring detection or detection based on visual prompt you can try to combine SAM 2 with our T-Rex2 model.

And you can support this pipeline with video-editing model for additional editing on videos

@rentainhe
Copy link
Collaborator

Hi @wentao-uw , for image prompt detection and segmentation, you can also try DINOv for this function. It can track or detect anything by visual prompt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants