You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @wentao-uw , it's a good idea to support referring detection or segmentation based on image prompt, but Grounding DINO can only support text prompts now, for referring detection or detection based on visual prompt you can try to combine SAM 2 with our T-Rex2 model.
And you can support this pipeline with video-editing model for additional editing on videos
Hi @wentao-uw , for image prompt detection and segmentation, you can also try DINOv for this function. It can track or detect anything by visual prompt.
Use case: replace logo or text in the video. Input: old logo, video (with old logo), new logo; output: video with new logo
The text was updated successfully, but these errors were encountered: