Revised EnsembleSummaryProvider interface #950

sigurdp · 2022-02-07T12:22:36Z

Refined EnsembleSummaryProvider interface to support finer control of which range of dates get returned when using lazy resampling and requesting data for multiple realizations.

Previously, when using lazy resampling, and requesting vector data for multiple realizations through EnsembleSummaryProvider.get_vectors_df(), each returned realization would contain the smalles range of dates possible that would still cover the raw data's dates.. This PR makes it possible to force the date range of all the returned realizations to be the same - either by truncating or extending the range of dates so that all realizations have an equal date range and contain the same dates.

The new behavior can be controlled through the common_date_span property of the ResamplingOptions options object that is now accepted by EnsembleSummaryProvider.get_vectors_df(). Please see documentation for ResamplingOptions for more details.

The EnsembleSummaryProvider.dates() function has also been extended with an additional argument, date_span, that controls wheter the union or intersection of dates over realizations is returned.

Note that this is a breaking change in the EnsembleSummaryProvider interface that affects the following member functions:

EnsembleSummaryProvider.dates()
EnsembleSummaryProvider.get_vectors_df()

As an example of code changes needed in the client code, consider the case where we are requesting lazy resampling of two vectors, for all realizations, with a MONTHLY resampling frequency. Currently this would be accomplished like this:

vecdf = myprovider.get_vectors_df(
    vector_names=["VECA", "VECB"],
    resampling_frequency=Frequency.MONTHLY,
    realizations=None,
)

With the revised interface, the client code would have to be changed to something like this:

vecdf = myprovider.get_vectors_df(
    vector_names=["VECA", "VECB"],
    resampling_options=ResamplingOptions(frequency=Frequency.MONTHLY, common_date_span=None),
    realizations=None,
)

Contributor checklist

🤖 I have added tests, or extended existing tests, to cover any new features or bugs fixed in this PR.
📖 I have considered adding a new entry in CHANGELOG.md, and added it if should be communicated there.

…trol of the date ranges returned when requesting data for multiple realizations

…aking changes

…interface

…rface - Adjust unit tests

…interface

anders-kiaer

❤️ the tests! 🤖 Great code style as well - on a difficult domain technical issue. 👏

I'm not able to find much to comment on, but two questions:

The pattern
```
ResamplingOptions(resampling_frequency, date_span)
if resampling_frequency
else None
```
appears to repeat itself somewhat. Should we consider adjusting ResamplingOptions to have type frequency: Optional[Frequency] such that a frequency = None can be passed directly, similar to common_date_span also being Optional (i.e. support None)? Or maybe this change is not relevant anymore after the second question:

I'm not 100% sure that I remember correctly our conclusion from February, but what I think I remember is that expected logic behind the scenes (as end user), wrt. how date is handled (intersection, union or raw), is along the lines of:
- When only individual realizations are plotted, we don't need to care about union/intersection of dates. Each realization can if we want be plotted with its own date span.
- When any kind of statistics is calculated, we need to either take union or intersection across the dates (to get a common date axis). For vectors that are of type cumulative we can take the union (since these have a well defined extrapolation). For all other types of vectors we take the intersection (since these are unsafe to extrapolate - we could potentially also extrapolate rate vectors by assuming they are zero outside data span, but I'm not sure if we today have the metadata necessary to differentiate between rate vectors and e.g. pressure vectors? @asnyv?).

If this is the logic we want, resampling frequency is (still) a general property across all vectors accessed, but the date span (union/intersection) depends on vector type. 🤔

Do I recall incorrectly, or has something else popped up during development that forces us to reconsider this logic? Or maybe something along these lines is already implemented (and I'm just reading the code incorrectly). 😆 🙂

anders-kiaer · 2022-03-24T08:20:53Z

webviz_subsurface/plugins/_simulation_time_series/utils/history_vectors.py

@@ -60,7 +64,14 @@ def create_history_vectors_df(
        return pd.DataFrame()

    historical_vector_names = list(historical_vector_and_vector_name_dict.keys())
+
+    # NOTE: DateSpane = None


Suggested change

# NOTE: DateSpane = None

# NOTE: DateSpan = None

anders-kiaer · 2022-03-24T08:27:46Z

...z_subsurface/plugins/_simulation_time_series/types/derived_ensemble_vectors_accessor_impl.py

+            resampling_options if self._provider.supports_resampling() else None
+        )
+
+        # Make date span union if calculating data relative to date


Should we expand on this comment? Not immediately clear to me why data relative to date → we should take date unione.

asnyv · 2022-03-25T07:56:13Z

When any kind of statistics is calculated, we need to either take union or intersection across the dates (to get a common date axis). For vectors that are of type cumulative we can take the union (since these have a well defined extrapolation). For all other types of vectors we take the intersection (since these are unsafe to extrapolate - we could potentially also extrapolate rate vectors by assuming they are zero outside data span, but I'm not sure if we today have the metadata necessary to differentiate between rate vectors and e.g. pressure vectors? @asnyv?).

@anders-kiaer We don't have the metadata, but I think maybe a varying date span between statistics of different vectors could be a bit confusing for users, and that we should consider to go for the union for all vectors?

Draft suggestion for revised EnsembleSummaryProvider interface

ef3d2bb

jorgenherje mentioned this pull request Feb 8, 2022

Delta relative to date within ensemble SimulationTimeSeries #951

Merged

anders-kiaer assigned sigurdp Feb 10, 2022

sigurdp and others added 6 commits March 2, 2022 18:34

Implemented revised summary provider interface that allows better con…

be37724

…trol of the date ranges returned when requesting data for multiple realizations

Added temorary hacks in simulation time series plugin to mitigate bre…

81257c8

…aking changes

Merge branch 'master' into sigurdp/revised-ensemble-summary-provider-…

5595e95

…interface

Sorting of imports

9152776

Update SimulationTimeSeries plugin according to revised provider inte…

5fb33f0

…rface - Adjust unit tests

Sort imports

c3b5895

sigurdp marked this pull request as ready for review March 6, 2022 22:11

sigurdp added 2 commits March 14, 2022 18:00

Merge branch 'master' into sigurdp/revised-ensemble-summary-provider-…

e37d676

…interface

Updated changelog

6f931a3

sigurdp requested review from asnyv and anders-kiaer March 15, 2022 08:25

anders-kiaer added the next release 🚢 To be included in next release label Mar 24, 2022

anders-kiaer reviewed Mar 24, 2022

View reviewed changes

anders-kiaer removed the next release 🚢 To be included in next release label Mar 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revised EnsembleSummaryProvider interface #950

Revised EnsembleSummaryProvider interface #950

sigurdp commented Feb 7, 2022 •

edited

Loading

anders-kiaer left a comment

anders-kiaer Mar 24, 2022

anders-kiaer Mar 24, 2022

asnyv commented Mar 25, 2022

Revised EnsembleSummaryProvider interface #950

Are you sure you want to change the base?

Revised EnsembleSummaryProvider interface #950

Conversation

sigurdp commented Feb 7, 2022 • edited Loading

Contributor checklist

anders-kiaer left a comment

Choose a reason for hiding this comment

anders-kiaer Mar 24, 2022

Choose a reason for hiding this comment

anders-kiaer Mar 24, 2022

Choose a reason for hiding this comment

asnyv commented Mar 25, 2022

sigurdp commented Feb 7, 2022 •

edited

Loading