Refactor Callbacks #60

HCookie · 2024-09-24T09:47:49Z

Split into seperate files
Use list in config to add callbacks
Provide legacy config enabled approach
Fix ruff issues

New Usage

Set config.diagnostics.callbacks to a list of callback names to include

Closes #59

📚 Documentation preview 📚: https://anemoi-training--60.org.readthedocs.build/en/60/

- Split into seperate files - Use list in config to add callbacks - Provide legacy config enabled approach - Fix ruff issues

FussyDuck · 2024-09-24T09:47:54Z

All committers have signed the CLA.

HCookie · 2024-09-24T09:57:36Z

At the moment, this is the proposed refactor, I am yet to complete an exhaustive test of the changes

JesperDramsch · 2024-09-24T10:33:13Z

Great work, thank you for taking this on.

I was thinking that it might be nice to make this fully configurable through instantiate.

For example, no one is really using the stochastic weight averaging as far as I know, so having specific config entries for this is a bit of feature bloat.

Then the list of callbacks would just look like this:

callbacks:
  swa: _target_: pytorch_lightning.callbacks.stochastic_weight_avg.StochasticWeightAveraging
          swa_lr: 1e-4
          swa_epoch_start: 123
          annealing_epochs: 5
          annealing_strategy: cos
          device: null
  blabla: _target_: blabla_callback
             blabla: bla

This makes it more extensible and actually reduces some of or less used config entries.

Additionally, we can keep the standard callbacks, like model checkpoints as "permanent callback" (I don't think we have to make everything optional).

One idea I also had is that we could make a special list for "plot_callbacks" in the same style. Then we can easily keep the super convenient "plots.enabled = False" as a shortcut to disable them?

…acks

JesperDramsch

Hi @HCookie, thanks for taking on the callbacks!

It's already much better, great work on that. I think we can take the refactor even further and make the callbacks (almost?) fully modular, which would be incredible for future extensibility.

One comment regarding the file names. So far we haven't been using <xyz>-ing.py as language. Especially "checkpointing" would be confusing with activation checkpointing (although that is and will stay confusing honestly). Can we rename these please?

src/anemoi/training/diagnostics/callbacks/plotting.py

src/anemoi/training/diagnostics/callbacks/learning_rate.py

src/anemoi/training/diagnostics/callbacks/__init__.py

src/anemoi/training/config/diagnostics/eval_rollout.yaml

src/anemoi/training/diagnostics/callbacks/__init__.py

for more information, see https://pre-commit.ci

- Prefill config with callbacks - Warn on deprecations for old config - Expand config enabled - Add back SWA - Fix logging callback - Add flag to disable checkpointing - Add testing

…lback

[feature] Fix trainable attribute callbacks

src/anemoi/training/config/diagnostics/evaluation.yaml

src/anemoi/training/diagnostics/callbacks/plot.py

src/anemoi/training/train/forecaster.py

src/anemoi/training/diagnostics/callbacks/plot.py

src/anemoi/training/diagnostics/callbacks/checkpoint.py

JesperDramsch is currently absent

…ning into fix/refactor_callbacks

…acks

CHANGELOG.md

JPXKQX

A long-awaited refactor. Thanks Harrison for the amazing work! I have tested with 1 GPU and 2 GPU (with num_gpus_per_model=2), and all plots in "detailed" option are produced as expected.

src/anemoi/training/diagnostics/callbacks/plot.py

Co-authored-by: Sara Hahner <[email protected]>

src/anemoi/training/diagnostics/callbacks/plot.py

sahahner · 2024-10-25T13:02:43Z

src/anemoi/training/config/diagnostics/plot/detailed.yaml

+    - 2t
+    - 10u
+    - 10v
+  - _target_:  anemoi.training.diagnostics.callbacks.plot.LongRolloutPlots


As long as dataloader.validation_rollout=1, which is the default, this callback only increases the runtime without providing any additional plots. Should we move it into rollout_eval.yaml?

We could provide a rollout plots configuration?
Addressed in f1d883f

Refactor Callbacks

b12fac8

- Split into seperate files - Use list in config to add callbacks - Provide legacy config enabled approach - Fix ruff issues

HCookie requested a review from JesperDramsch September 24, 2024 09:47

HCookie self-assigned this Sep 24, 2024

HCookie added 2 commits September 24, 2024 09:49

Update changelog

29a8477

Fix TypeError

15824be

HCookie removed the request for review from JesperDramsch September 24, 2024 09:57

HCookie added 5 commits September 25, 2024 08:13

Move to hydra.instantiate

4077bf4

Merge remote-tracking branch 'origin/develop' into fix/refactor_callb…

494d39d

…acks

Add __all__

fe37c02

Add to base config

2d8275c

Fix nested list

230eb0e

HCookie marked this pull request as ready for review September 25, 2024 09:41

HCookie requested a review from JesperDramsch September 25, 2024 10:42

HCookie added 2 commits September 26, 2024 14:31

Fix nested get issue

5547b20

Fix type checking

1d80cfb

HCookie mentioned this pull request Oct 1, 2024

Rollout video of variable dynamics #65

Draft

HCookie and others added 3 commits October 1, 2024 15:29

Merge branch 'develop' into fxi/refactor_callbacks

e79dfc7

feat: edge plot in callbacks

96ab74c

feat: set default extra callbacks

4aeb1a5

JesperDramsch previously requested changes Oct 1, 2024

View reviewed changes

pre-commit-ci bot and others added 7 commits October 2, 2024 10:54

[pre-commit.ci] auto fixes from pre-commit.com hooks

816b3af

for more information, see https://pre-commit.ci

fix: typing & refactoring

644038f

fix: remove list comprehension

8356cd4

Refactor according to PR

930e4d2

- Prefill config with callbacks - Warn on deprecations for old config - Expand config enabled - Add back SWA - Fix logging callback - Add flag to disable checkpointing - Add testing

Update deprecation warning

52ea91f

Merge branch 'fxi/refactor_callbacks' into feature/graph-features-cal…

0dd81b7

…lback

Merge pull request #71 from ecmwf/feature/graph-features-callback

332f746

[feature] Fix trainable attribute callbacks

mc4117 reviewed Oct 22, 2024

View reviewed changes

src/anemoi/training/config/diagnostics/evaluation.yaml Show resolved Hide resolved

mc4117 reviewed Oct 22, 2024

View reviewed changes

src/anemoi/training/diagnostics/callbacks/plot.py Show resolved Hide resolved

mc4117 reviewed Oct 22, 2024

View reviewed changes

src/anemoi/training/diagnostics/callbacks/plot.py Show resolved Hide resolved

mc4117 reviewed Oct 22, 2024

View reviewed changes

src/anemoi/training/train/forecaster.py Show resolved Hide resolved

HCookie added 2 commits October 22, 2024 16:12

Remove TP reference

382728c

Remove missing config reference

6fa66cc

JPXKQX reviewed Oct 23, 2024

View reviewed changes

src/anemoi/training/diagnostics/callbacks/plot.py Outdated Show resolved Hide resolved

JPXKQX reviewed Oct 23, 2024

View reviewed changes

src/anemoi/training/diagnostics/callbacks/checkpoint.py Outdated Show resolved Hide resolved

HCookie and others added 3 commits October 23, 2024 15:37

Swapped histogram and spectrum

110fb64

Update copyright notice

23cc785

Merge branch 'develop' into fxi/refactor_callbacks

bfe76f3

HCookie and others added 4 commits October 24, 2024 15:13

Merge branch 'develop' into fxi/refactor_callbacks

5a6880e

Fix issues with split of PlotAdditionalMetrics

51a455d

Merge branch 'fxi/refactor_callbacks' of github.com:ecmwf/anemoi-trai…

3318675

…ning into fix/refactor_callbacks

Merge remote-tracking branch 'origin/develop' into fix/refactor_callb…

77bd65d

…acks

JPXKQX reviewed Oct 24, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Fix CHANGELOG

3c6e1af

JPXKQX previously approved these changes Oct 25, 2024

View reviewed changes

Fix documentation for callbacks

86059a9

HCookie dismissed JPXKQX’s stale review via 86059a9 October 25, 2024 10:49

HCookie and others added 2 commits October 25, 2024 11:00

Add all callback submodules to docs

0bce490

Merge branch 'develop' into fxi/refactor_callbacks

f5057c6

sahahner reviewed Oct 25, 2024

View reviewed changes

src/anemoi/training/diagnostics/callbacks/plot.py Outdated Show resolved Hide resolved

sahahner reviewed Oct 25, 2024

View reviewed changes

src/anemoi/training/diagnostics/callbacks/plot.py Outdated Show resolved Hide resolved

Apply suggestions from code review

d6e1d9c

Co-authored-by: Sara Hahner <[email protected]>

sahahner reviewed Oct 25, 2024

View reviewed changes

src/anemoi/training/diagnostics/callbacks/plot.py Outdated Show resolved Hide resolved

Fix init args issue in RolloutPlots

6073d84

sahahner reviewed Oct 25, 2024

View reviewed changes

Add rollout_eval config

f1d883f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Callbacks #60

Refactor Callbacks #60

HCookie commented Sep 24, 2024 •

edited by github-actions bot

Loading

FussyDuck commented Sep 24, 2024 •

edited

Loading

HCookie commented Sep 24, 2024

JesperDramsch commented Sep 24, 2024

JesperDramsch left a comment

JPXKQX left a comment

sahahner Oct 25, 2024

HCookie Oct 25, 2024 •

edited

Loading

Refactor Callbacks #60

Are you sure you want to change the base?

Refactor Callbacks #60

Conversation

HCookie commented Sep 24, 2024 • edited by github-actions bot Loading

New Usage

FussyDuck commented Sep 24, 2024 • edited Loading

HCookie commented Sep 24, 2024

JesperDramsch commented Sep 24, 2024

JesperDramsch left a comment

Choose a reason for hiding this comment

JPXKQX left a comment

Choose a reason for hiding this comment

sahahner Oct 25, 2024

Choose a reason for hiding this comment

HCookie Oct 25, 2024 • edited Loading

Choose a reason for hiding this comment

HCookie commented Sep 24, 2024 •

edited by github-actions bot

Loading

FussyDuck commented Sep 24, 2024 •

edited

Loading

HCookie Oct 25, 2024 •

edited

Loading