Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider changing default behaviour for missing data in Combinations #99

Open
berland opened this issue Mar 6, 2020 · 1 comment
Open

Comments

@berland
Copy link
Collaborator

berland commented Mar 6, 2020

RealizationCombination and EnsembleCombination can do linear combination of dataframes. When these are indexed by a DATE column, it will only combine for DATEs existing in both datasets, and drop the rest.

For summary data, there is get_smry() in VirtualEnsemble support that will extrapolate any summary data correctly (zero for rate vectors, constant for cumulative vectors), meaning it is technically possible to combine any realizations summary data (even with no overlapping DATE). This is relevant in situations where the end-date of a simulation is variable by design. Right now this can probably be worked around by providing list of datetimes, or possibly an end_date, but would require custom coding.

If functionality is changed to always extrapolate, there will be side-effects when end-date is variable due to errors. It probably makes more sense to put responsibility on the user for filtering out bad simulations.

@asnyv

@asnyv
Copy link
Collaborator

asnyv commented Mar 6, 2020

Not sure if I understand you correctly here. Correct if I think this only applies when you don't provide a time_index? If you have a frequency defined this already happens (as we discussed for realizations that end due to errors)?

My experience with EnsembleCombination is this:

  • If you don't provide a time_index, the output DATE vector of each realization is a union of the raw time data of the two combined realizations, where it is filled with NaN unless the combined realizations both have that timestep. (Assume the same thing happens for a single RealizationCombination).
  • If a time_index is provided, the data is interpolated to e.g. the frequency you asked for.

So to (maybe) answer your question: Considering that you in the rest of fmu-ensemble have to be careful with runs that crashed no matter what, I think it makes sense to at least extrapolate after one of two combined realizations has finished. When it comes to interpolating missing dates (to avoid the NaNs), an issue is perhaps that while you know that that the data is interpolated when you provide a time_index frequency like monthly, users might think that the data frequency in the combination is higher than it actually is if it is interpolated by default? Probably doesn't have massive consequences unless the time steps are quite long though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants