Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Example for training ColBERT using Pylate in terms of contrastive way #164

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

sigridjineth
Copy link

Changes

  • Add Example for training ColBERT using Pylate in terms of contrastive way. The existing script only shows the example for training with teacher distillation strategy.

@sigridjineth sigridjineth changed the title add train_pylate_contrastive.py Add Example for training ColBERT using Pylate in terms of contrastive way Dec 26, 2024
@NohTow
Copy link
Collaborator

NohTow commented Dec 26, 2024

Hello,

I don't have my computer right now, but I also trained models using contrastive loss during the experiments for the paper and IIRC, the boilerplates from PyLate work out of the box if you just do not compile explicitely (as we compile internally as I specified in the other issue).

So I think I rather prefer we do that, as I am not sure about the side effects the parameters of the inductor can have and we just don't need to compile explicitely (actually now the parameter to compile models in ST has been fixed, so even for the other models we should offload this using this parameter).

@sigridjineth
Copy link
Author

@NohTow Yes, here, I used the Pylate boilerplate and worked pretty well out of the box when not explicitly compiling it. Have you any kinds of concerns not compiling the model when training as in my script?

@NohTow
Copy link
Collaborator

NohTow commented Dec 29, 2024

Have you any kinds of concerns not compiling the model when training as in my script?

Not at all, especially as the ModernBERT is compiled by default (even if you do not call model = torch.compile(model). My comment is rather to remove these two lines:

torch._inductor.config.fallback_random = True
torch._inductor.config.triton.unique_kernel_names = True

As they are not needed if you do not have the model = torch.compile(model)line and I am not sure about some side effects it could silently induce!

examples/train_pylate_contrastive.py Outdated Show resolved Hide resolved
examples/train_pylate_contrastive.py Outdated Show resolved Hide resolved
examples/train_pylate_contrastive.py Outdated Show resolved Hide resolved
examples/train_pylate_contrastive.py Outdated Show resolved Hide resolved
examples/train_pylate_contrastive.py Outdated Show resolved Hide resolved
examples/train_pylate_contrastive.py Outdated Show resolved Hide resolved
@sigridjineth sigridjineth requested a review from NohTow December 30, 2024 11:24
@sigridjineth
Copy link
Author

@NohTow okay, request it again

@NohTow
Copy link
Collaborator

NohTow commented Jan 2, 2025

I took the liberty to modify a bit the script to make it closer to the other boilerplate on this repository.
The script works so it can be merged, I just don't know about the batch size and learning rate (as I did not really explore much training ModernColBERT with contrastive learning), but I guess that is ok to not have optimal HP for boilerplates.
Although the LR seems very small to me compared to what we are used to with ModernBERT, did you run some sweeps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants