From pretrained normally? #64

johnml1135 · 2023-11-20T18:42:40Z

So, it takes 4 minutes to build all the weights even of a 600MB distilled model on my RTX3090. If I am correct (I may not be), we should be able to cache checkpoints at position 0 for the NLLB models - which could dramatically reduce that startup time. That would be very helpful for debugging quick builds and running E2E testing. I am unsure exactly the code change to make, but the idea would be something like:

in hugging_face_model_trainer when about to make the Seq2SeqTrainer:
- First, check if the model is a string (is this how it comes in when starting fresh?)
- If so, check if there is a cached version of the model hanging out in a cache folder.
- If not create the model and save the cache
- Keep going

I am unsure whether there would need to be a separate cached version for each project (undesirable) or if it could be one per NLLB model type.

I could be going about this wrong, but I saw some things that looked similar to these ideas but nothing slam-dunk.

The text was updated successfully, but these errors were encountered:

johnml1135 · 2023-12-01T12:41:57Z

@ddaspit do you have any insight into this? It could dramatically reduce the "10 step" build time from 6 minutes to 2 minutes.

ddaspit · 2023-12-01T14:59:53Z

I have no idea if this is possible. I am not aware of a way to do this in Huggingface or PyTorch. I think we would need to do more investigation to determine the exact cause of the long startup time.

johnml1135 · 2023-12-01T16:06:52Z

This may be of help - huggingface/transformers#21913.

johnml1135 assigned ddaspit Nov 20, 2023

johnml1135 added the ci label Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

From pretrained normally? #64

From pretrained normally? #64

johnml1135 commented Nov 20, 2023

johnml1135 commented Dec 1, 2023

ddaspit commented Dec 1, 2023

johnml1135 commented Dec 1, 2023

From pretrained normally? #64

From pretrained normally? #64

Comments

johnml1135 commented Nov 20, 2023

johnml1135 commented Dec 1, 2023

ddaspit commented Dec 1, 2023

johnml1135 commented Dec 1, 2023