You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
update on renaming checkpoints: i've edited the bits of the codebase i'm familiar with but i'm not familiar the other parts that need to happen:
need to name model checkpoints we're saving to
need to name optimizer checkpoints loading from/to
I think it would be less complexity to make a new dir for each training run - based on a timestamp or similar. this also safeguards against accidentally overwriting files.
I've left the progress i've made in the branch rename_checkpoint
dkimpara
changed the title
checkpoint names are hardcoded into train and predict etc scripts
make new dir for each model training job
Oct 11, 2024
dkimpara
changed the title
make new dir for each model training job
make save dir for each model training job
Oct 11, 2024
for fsdp: its model_checkpoint.pt
for everything else: checkpoint.pt
The text was updated successfully, but these errors were encountered: