Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CombinedLoss not working/not tested #68

Open
OpheliaMiralles opened this issue Jan 9, 2025 · 2 comments · May be fixed by #70
Open

CombinedLoss not working/not tested #68

OpheliaMiralles opened this issue Jan 9, 2025 · 2 comments · May be fixed by #70
Assignees
Labels
bug Something isn't working training

Comments

@OpheliaMiralles
Copy link
Contributor

What happened?

The node_weights tensor is never passed as an argument to individual losses. The node_loss_weights resolves with hydra in a GraphNodeAttribute which does not translate anywhere into a tensor other than through the get_node_weights method from the GraphForecaster.
It means that the node_weights tensor is provided as a kwarg to the CombinedLoss init, which does not need it, but when hydra instantiates the individual losses, there is no tensor but only a GraphNodeAttribute object.
Reminder: the CombinedLoss is not tested.

What are the steps to reproduce the bug?

Write that in the training config (the documentation string is wrong, node_weights is not provided anywhere if you follow it).


training_loss:
  _target_: anemoi.training.losses.combined.CombinedLoss
  losses:
    - _target_: anemoi.training.losses.mse.WeightedMSELoss
      node_weights: ${training.node_loss_weights_resolved}
    - _target_: anemoi.training.losses.mae.WeightedMAELoss
      node_weights: ${training.node_loss_weights_resolved}
  scalars:
    - variable
  loss_weights:
    - 1.0
    - 0.5
  ignore_nans: True

node_loss_weights:
  _target_: anemoi.training.losses.nodeweights.GraphNodeAttribute
  target_nodes: ${graph.data}
  node_attribute: area_weight
  
node_loss_weights_resolved: ${training.node_loss_weights}

Version

latest develop

Platform (OS and architecture)

Linux balfrin-ln002 5.14.21-150400.24.81_12.0.87-cray_shasta_c

Relevant log output

No response

Accompanying data

MeteoSwiss

Organisation

MeteoSwiss

@OpheliaMiralles OpheliaMiralles added the bug Something isn't working label Jan 9, 2025
@OpheliaMiralles OpheliaMiralles self-assigned this Jan 9, 2025
@OpheliaMiralles OpheliaMiralles linked a pull request Jan 9, 2025 that will close this issue
@HCookie
Copy link
Member

HCookie commented Jan 14, 2025

Hi,
Thanks for pointing out that the CombinedLoss is currently broken, I see that tests were not properly developed for it,
I think the issue you are having with hydra should be solveable just by setting _recursive_ = False in either the code or the config. Rather then redefining the node_weights it should be passed to the children losses.
However, in my testing I have noticed other issues with the classes, and scalars.

@HCookie
Copy link
Member

HCookie commented Jan 14, 2025

I've made a branch with my suggested changes here
fix/combined_loss...fix/combined_loss_hcookie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working training
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants