-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Future reserved tokens and backward (in)compatibility #102
Comments
New ideas:
As a unit test, we'll use an old trained model and model outputs on a fixed sequence, and compare them to model outputs when we load and evaluate the model with new code. |
Thanks for this thoughtful and detailed writeup!
First I want to remind you that N sites for the SHM model are not "trivial". The SHM model for the G in AAGCT is different than that for AANNNGCT. This is OK when we are looking at the junction between two chains because the heavy and the light SHM separately, but it wouldn't be true if we had some other special token. Important: I don't want to get hung up on the "future special token" thing. I think that we should as much as possible just extend the current setup such that it works for the H/L case without setting any huge traps for the future. I think that this could just be
I'd love for this to be wrapped by EOD Monday at the latest so we can return to the important work of actually training models.
A cursory look at when we instantiate Dataset classes indicated that in our main applications, we are making a model and a Dataset class at the same time. Although we should always hesitate when introducing additional dependencies, if it would be useful it seems like one could pass the model or model class to the Dataset constructor. I also note that for the SHM models we store information about the sequence encoder in the crepe. We should probably call that notion of encoder as a tokenizer or something because it's not the transformer-encoder. Let's hold off on introducing model versions for now. Again, thanks! ✨ |
After the paired chain PR #92 we can no longer load old models, because the amino acid embedding has changed size:
We will want to add additional reserved tokens to our model inputs in the future. It would be great to not have to throw away all old trained models whenever we do. One way to handle this would be:
It would be nice to figure out a way to keep our model architecture flexible enough to not have to reserve tokens proactively like this. I'll be doing some reading to see if that's possible.
The text was updated successfully, but these errors were encountered: