-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SigLIP impl #634
SigLIP impl #634
Conversation
Can't comment on the distributed part of the code as I don't know that part of PyTorch, but the rest (loss details, bias/temp/inits) LGTM. |
@lucasb-eyer thanks for taking a look, yeah the dist part is where a lot of the risk is, but seems to be behaving on local cc12m runs comparing single to 4x GPU. |
FYI: in our code, Basil implemented a small unit-test checking both formulations for "almost equalness" of chunked vs non-chunked, this gave us good reassurance in the implementation (+looking at profiler for memory use). |
I've tested
Will merge shortly to prevent this getting stale |
* Initial SigLIP impl * Add logit_bias to custom text clip * non-dict model output wrong way around wrt logit_bias * Disable diving loss by world size, better without * A bit of cleanup * Add bidirectional exchange option, more cleanup * Add reference in siglip docstring * Remove some comments after further verification * bidir exchange by default * Proper bidir default
Re #618