Consider changing default behaviour for missing data in Combinations #99

berland · 2020-03-06T07:53:25Z

RealizationCombination and EnsembleCombination can do linear combination of dataframes. When these are indexed by a DATE column, it will only combine for DATEs existing in both datasets, and drop the rest.

For summary data, there is get_smry() in VirtualEnsemble support that will extrapolate any summary data correctly (zero for rate vectors, constant for cumulative vectors), meaning it is technically possible to combine any realizations summary data (even with no overlapping DATE). This is relevant in situations where the end-date of a simulation is variable by design. Right now this can probably be worked around by providing list of datetimes, or possibly an end_date, but would require custom coding.

If functionality is changed to always extrapolate, there will be side-effects when end-date is variable due to errors. It probably makes more sense to put responsibility on the user for filtering out bad simulations.

@asnyv

The text was updated successfully, but these errors were encountered:

asnyv · 2020-03-06T09:51:05Z

Not sure if I understand you correctly here. Correct if I think this only applies when you don't provide a time_index? If you have a frequency defined this already happens (as we discussed for realizations that end due to errors)?

My experience with EnsembleCombination is this:

If you don't provide a time_index, the output DATE vector of each realization is a union of the raw time data of the two combined realizations, where it is filled with NaN unless the combined realizations both have that timestep. (Assume the same thing happens for a single RealizationCombination).
If a time_index is provided, the data is interpolated to e.g. the frequency you asked for.

So to (maybe) answer your question: Considering that you in the rest of fmu-ensemble have to be careful with runs that crashed no matter what, I think it makes sense to at least extrapolate after one of two combined realizations has finished. When it comes to interpolating missing dates (to avoid the NaNs), an issue is perhaps that while you know that that the data is interpolated when you provide a time_index frequency like monthly, users might think that the data frequency in the combination is higher than it actually is if it is interpolated by default? Probably doesn't have massive consequences unless the time steps are quite long though...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider changing default behaviour for missing data in Combinations #99

Consider changing default behaviour for missing data in Combinations #99

berland commented Mar 6, 2020

asnyv commented Mar 6, 2020

Consider changing default behaviour for missing data in Combinations #99

Consider changing default behaviour for missing data in Combinations #99

Comments

berland commented Mar 6, 2020

asnyv commented Mar 6, 2020