Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem Restarting Jobs #107

Open
lyncdw19 opened this issue Oct 11, 2024 · 0 comments
Open

Problem Restarting Jobs #107

lyncdw19 opened this issue Oct 11, 2024 · 0 comments

Comments

@lyncdw19
Copy link

lyncdw19 commented Oct 11, 2024

I have a few Allegro training jobs that have stopped early due to wall time limit that have not yet converged. I have tried to restart them to continue training until convergence. As far as I understand, this done by simply running the training command again within the same directory, and the previous saved model will be automatically loaded and continue training. Unfortunately, I am getting the following error when trying to restart.

Traceback (most recent call last): File "..//env-allegro/.pixi/envs/default/bin/nequip-train", line 10, in <module> sys.exit(main()) File "..//env-allegro/.pixi/envs/default/lib/python3.10/site-packages/nequip/scripts/train.py", line 96, in main trainer = restart(config) File "..//env-allegro/.pixi/envs/default/lib/python3.10/site-packages/nequip/scripts/train.py", line 289, in restart raise ValueError( ValueError: Key "optimizer_kwargs" is different in config and the result trainer.pth file. Please double check

I have not changed the yaml config file -- I am using the same one that I originally began training with.

Does anyone know what is causing this error and how to fix it? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant