Incorporate other embedding models such as DINOv2? #388

YuanyuanLi96 · 2024-08-06T15:48:05Z

I enjoy using this library very much. However, I notice that other embedding techniques like DINOv2 may also be used in building the search index, and perhaps leads to higher retrieval accuracy. Is there an easy way I can load the 'facebook/dinov2-base' model from huggingface and still use clip_inference?

ytzeng1 · 2024-08-15T20:53:32Z

One quick and dirty approach is just to load the state dict of DINOv2 to the visual encoder of a CLIP model, see the discuss in this thread if you are using open_clip. You probably need to retrain your text encoder in LIT style to align the text and image in the latent space if you wish to keep the text search functionality.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorporate other embedding models such as DINOv2? #388

Incorporate other embedding models such as DINOv2? #388

YuanyuanLi96 commented Aug 6, 2024

ytzeng1 commented Aug 15, 2024

Incorporate other embedding models such as DINOv2? #388

Incorporate other embedding models such as DINOv2? #388

Comments

YuanyuanLi96 commented Aug 6, 2024

ytzeng1 commented Aug 15, 2024