-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Callbacks #60
base: develop
Are you sure you want to change the base?
Conversation
- Split into seperate files - Use list in config to add callbacks - Provide legacy config enabled approach - Fix ruff issues
At the moment, this is the proposed refactor, I am yet to complete an exhaustive test of the changes |
Great work, thank you for taking this on. I was thinking that it might be nice to make this fully configurable through instantiate. For example, no one is really using the stochastic weight averaging as far as I know, so having specific config entries for this is a bit of feature bloat. Then the list of callbacks would just look like this:
This makes it more extensible and actually reduces some of or less used config entries. Additionally, we can keep the standard callbacks, like model checkpoints as "permanent callback" (I don't think we have to make everything optional). One idea I also had is that we could make a special list for "plot_callbacks" in the same style. Then we can easily keep the super convenient "plots.enabled = False" as a shortcut to disable them? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @HCookie, thanks for taking on the callbacks!
It's already much better, great work on that. I think we can take the refactor even further and make the callbacks (almost?) fully modular, which would be incredible for future extensibility.
One comment regarding the file names. So far we haven't been using <xyz>-ing.py
as language. Especially "checkpointing" would be confusing with activation checkpointing (although that is and will stay confusing honestly). Can we rename these please?
for more information, see https://pre-commit.ci
- Prefill config with callbacks - Warn on deprecations for old config - Expand config enabled - Add back SWA - Fix logging callback - Add flag to disable checkpointing - Add testing
[feature] Fix trainable attribute callbacks
JesperDramsch is currently absent
…ning into fix/refactor_callbacks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A long-awaited refactor. Thanks Harrison for the amazing work! I have tested with 1 GPU and 2 GPU (with num_gpus_per_model=2), and all plots in "detailed" option are produced as expected.
Co-authored-by: Sara Hahner <[email protected]>
- 2t | ||
- 10u | ||
- 10v | ||
- _target_: anemoi.training.diagnostics.callbacks.plot.LongRolloutPlots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as dataloader.validation_rollout=1
, which is the default, this callback only increases the runtime without providing any additional plots. Should we move it into rollout_eval.yaml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could provide a rollout plots configuration?
Addressed in f1d883f
New Usage
Set
config.diagnostics.callbacks
to a list of callback names to includeCloses #59
📚 Documentation preview 📚: https://anemoi-training--60.org.readthedocs.build/en/60/