Fix/combined loss #70

OpheliaMiralles · 2025-01-09T11:10:58Z

Try to fix #68
Add tests

FussyDuck · 2025-01-09T11:11:04Z

All committers have signed the CLA.

HCookie · 2025-01-14T16:22:08Z

training/src/anemoi/training/train/forecaster.py

+        if config.training.training_loss._target_ == 'anemoi.training.losses.combined.CombinedLoss':
+            assert "loss_weights" in config.training.training_loss, "Loss weights must be provided for combined loss"
+            losses = []
+            ignore_nans = config.training.training_loss.get("ignore_nans", False) # no point in doing this for each loss, nan+nan is nan
+            for loss in config.training.training_loss.losses:
+                node_weighting = instantiate(loss.node_weights)
+                loss_node_weights = node_weighting.weights(graph_data)
+                loss_node_weights = self.output_mask.apply(loss_node_weights, dim=0, fill_value=0.0)
+                loss_instantiated = self.get_loss_function(loss, scalars=self.scalars, **{"node_weights": loss_node_weights, "ignore_nans": ignore_nans})
+                losses.append(loss_instantiated)
+                assert isinstance(loss_instantiated, BaseWeightedLoss)
+            self.loss = instantiate({"_target_": config.training.training_loss._target_}, losses=losses, loss_weights = config.training.training_loss.loss_weights, **loss_kwargs)
+        else:
+            self.loss = self.get_loss_function(config.training.training_loss, scalars=self.scalars, **loss_kwargs)
+            assert isinstance(self.loss, BaseWeightedLoss) and not isinstance(
+                self.loss,
+                torch.nn.ModuleList,
+            ), f"Loss function must be a `BaseWeightedLoss`, not a {type(self.loss).__name__!r}"


I think that this is over specific for this use case, and instantiate's objects unneccessarily

Instantiating node_weights was necessary to call the combined loss but if you find a way around it, please let me know... I have another version where all of this is implemented in the get_loss_function from the forecaster. It is cleaner so I'll try to commit it soon.

Hi, yeah, as I wrote the loss functions code originally, I was able to find a way around, and only update the CombinedLoss class.

If you'd like, we can work together on https://github.com/ecmwf/anemoi-core/tree/fix/combined_loss_hcookie to make sure your use case is addressed.

I don't know, for me the CombinedLoss is not a BaseWeightedLoss, so I don't really see the point in trying to make it fit this base class. The weights don't mean the same thing here, and the individual losses should probably all have separate node weights. I'll update this PR today. Let me know what you think, but really we should not be afraid of separating use cases when they don't match, don't you think?

While the CombinedLoss may not be a clear use case of BaseWeightedLoss, it is still an anemoi loss function, and so inheritance based structures make sense. I am very wary of any solution that requires hard coding of any sort. Anemoi is designed to be a generic framework so following proper OOP principles is a must, otherwise any of these main classes end up with massive branching behaviours which is both hard to read and hard to use. (This is already the case in the GraphForecaster)

the individual losses should probably all have separate node weights

Excluding your use case of different losses for different params, in what case will this be true? Having a weighting between losses and then different relative weightings within the losses will massively increase complexity, and in my opinion be very hard to interpret.

There have been some changes implemented in #52 that I think may be interesting for your use cases? Shall we move this discussion to slack and organise a call?

In this usecase, the node weights are defined as several masks defining grids for different data sources (a radar mask, a station pointwise mask and a satellite mask). I believe it might be a common usecase in the scope of data assimilation, but of course it is part of a broader discussion. OK to move the discussion to slack. We can have a call next week, let's schedule on slack too.

HCookie · 2025-01-14T16:23:27Z

training/src/anemoi/training/losses/combined.py

+            elif hasattr(loss, "__class__"):
+                self.losses.append(loss)


Why are we checking for __class__? If checking for an object why not isinstance(loss, object)?

Because it could originally only take a class (of type "type", not instantiated) as losses arguments. Indeed, loss(**kwargs) called later in the function expects init arguments from the individual loss object and not forward arguments. As I said, I'll try to commit recent changes later.

OpheliaMiralles and others added 3 commits January 9, 2025 11:54

Fix combined loss and test

953a5ab

exclude nans from error colorbars

96ac9ae

Merge remote-tracking branch 'origin' into fix/combined_loss

0877443

HCookie self-requested a review January 14, 2025 15:08

HCookie reviewed Jan 14, 2025

View reviewed changes

HCookie assigned OpheliaMiralles and HCookie Jan 16, 2025

OpheliaMiralles marked this pull request as draft January 16, 2025 19:26

OpheliaMiralles added 2 commits January 20, 2025 08:45

Fix/cleanup

1185089

Cleanup

430dfee

OpheliaMiralles requested a review from HCookie January 20, 2025 08:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/combined loss #70

Fix/combined loss #70

OpheliaMiralles commented Jan 9, 2025

FussyDuck commented Jan 9, 2025 •

edited

Loading

HCookie Jan 14, 2025

OpheliaMiralles Jan 16, 2025

HCookie Jan 16, 2025

HCookie Jan 16, 2025 •

edited

Loading

OpheliaMiralles Jan 20, 2025 •

edited

Loading

HCookie Jan 20, 2025 •

edited

Loading

HCookie Jan 22, 2025

OpheliaMiralles Jan 24, 2025

HCookie Jan 14, 2025

OpheliaMiralles Jan 16, 2025

Fix/combined loss #70

Are you sure you want to change the base?

Fix/combined loss #70

Conversation

OpheliaMiralles commented Jan 9, 2025

FussyDuck commented Jan 9, 2025 • edited Loading

HCookie Jan 14, 2025

Choose a reason for hiding this comment

OpheliaMiralles Jan 16, 2025

Choose a reason for hiding this comment

HCookie Jan 16, 2025

Choose a reason for hiding this comment

HCookie Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

OpheliaMiralles Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

HCookie Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

HCookie Jan 22, 2025

Choose a reason for hiding this comment

OpheliaMiralles Jan 24, 2025

Choose a reason for hiding this comment

HCookie Jan 14, 2025

Choose a reason for hiding this comment

OpheliaMiralles Jan 16, 2025

Choose a reason for hiding this comment

FussyDuck commented Jan 9, 2025 •

edited

Loading

HCookie Jan 16, 2025 •

edited

Loading

OpheliaMiralles Jan 20, 2025 •

edited

Loading

HCookie Jan 20, 2025 •

edited

Loading