You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So, it takes 4 minutes to build all the weights even of a 600MB distilled model on my RTX3090. If I am correct (I may not be), we should be able to cache checkpoints at position 0 for the NLLB models - which could dramatically reduce that startup time. That would be very helpful for debugging quick builds and running E2E testing. I am unsure exactly the code change to make, but the idea would be something like:
in hugging_face_model_trainer when about to make the Seq2SeqTrainer:
First, check if the model is a string (is this how it comes in when starting fresh?)
If so, check if there is a cached version of the model hanging out in a cache folder.
If not create the model and save the cache
Keep going
I am unsure whether there would need to be a separate cached version for each project (undesirable) or if it could be one per NLLB model type.
I could be going about this wrong, but I saw some things that looked similar to these ideas but nothing slam-dunk.
The text was updated successfully, but these errors were encountered:
I have no idea if this is possible. I am not aware of a way to do this in Huggingface or PyTorch. I think we would need to do more investigation to determine the exact cause of the long startup time.
So, it takes 4 minutes to build all the weights even of a 600MB distilled model on my RTX3090. If I am correct (I may not be), we should be able to cache checkpoints at position 0 for the NLLB models - which could dramatically reduce that startup time. That would be very helpful for debugging quick builds and running E2E testing. I am unsure exactly the code change to make, but the idea would be something like:
Seq2SeqTrainer
:I am unsure whether there would need to be a separate cached version for each project (undesirable) or if it could be one per NLLB model type.
I could be going about this wrong, but I saw some things that looked similar to these ideas but nothing slam-dunk.
The text was updated successfully, but these errors were encountered: