You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I enjoy using this library very much. However, I notice that other embedding techniques like DINOv2 may also be used in building the search index, and perhaps leads to higher retrieval accuracy. Is there an easy way I can load the 'facebook/dinov2-base' model from huggingface and still use clip_inference?
The text was updated successfully, but these errors were encountered:
One quick and dirty approach is just to load the state dict of DINOv2 to the visual encoder of a CLIP model, see the discuss in this thread if you are using open_clip. You probably need to retrain your text encoder in LIT style to align the text and image in the latent space if you wish to keep the text search functionality.
I enjoy using this library very much. However, I notice that other embedding techniques like DINOv2 may also be used in building the search index, and perhaps leads to higher retrieval accuracy. Is there an easy way I can load the 'facebook/dinov2-base' model from huggingface and still use clip_inference?
The text was updated successfully, but these errors were encountered: