-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revised EnsembleSummaryProvider interface #950
base: master
Are you sure you want to change the base?
Revised EnsembleSummaryProvider interface #950
Conversation
…trol of the date ranges returned when requesting data for multiple realizations
…rface - Adjust unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ the tests! 🤖 Great code style as well - on a difficult domain technical issue. 👏
I'm not able to find much to comment on, but two questions:
- The pattern
appears to repeat itself somewhat. Should we consider adjusting
ResamplingOptions(resampling_frequency, date_span) if resampling_frequency else None
ResamplingOptions
to have typefrequency: Optional[Frequency]
such that afrequency = None
can be passed directly, similar tocommon_date_span
also beingOptional
(i.e. supportNone
)? Or maybe this change is not relevant anymore after the second question:
- I'm not 100% sure that I remember correctly our conclusion from February, but what I think I remember is that expected logic behind the scenes (as end user), wrt. how date is handled (intersection, union or raw), is along the lines of:
- When only individual realizations are plotted, we don't need to care about union/intersection of dates. Each realization can if we want be plotted with its own date span.
- When any kind of statistics is calculated, we need to either take union or intersection across the dates (to get a common date axis). For vectors that are of type cumulative we can take the union (since these have a well defined extrapolation). For all other types of vectors we take the intersection (since these are unsafe to extrapolate - we could potentially also extrapolate rate vectors by assuming they are zero outside data span, but I'm not sure if we today have the metadata necessary to differentiate between rate vectors and e.g. pressure vectors? @asnyv?).
If this is the logic we want, resampling frequency is (still) a general property across all vectors accessed, but the date span (union/intersection) depends on vector type. 🤔
Do I recall incorrectly, or has something else popped up during development that forces us to reconsider this logic? Or maybe something along these lines is already implemented (and I'm just reading the code incorrectly). 😆 🙂
@@ -60,7 +64,14 @@ def create_history_vectors_df( | |||
return pd.DataFrame() | |||
|
|||
historical_vector_names = list(historical_vector_and_vector_name_dict.keys()) | |||
|
|||
# NOTE: DateSpane = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# NOTE: DateSpane = None | |
# NOTE: DateSpan = None |
resampling_options if self._provider.supports_resampling() else None | ||
) | ||
|
||
# Make date span union if calculating data relative to date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we expand on this comment? Not immediately clear to me why data relative to date → we should take date unione.
@anders-kiaer We don't have the metadata, but I think maybe a varying date span between statistics of different vectors could be a bit confusing for users, and that we should consider to go for the union for all vectors? |
Refined
EnsembleSummaryProvider
interface to support finer control of which range of dates get returned when using lazy resampling and requesting data for multiple realizations.Previously, when using lazy resampling, and requesting vector data for multiple realizations through
EnsembleSummaryProvider.get_vectors_df()
, each returned realization would contain the smalles range of dates possible that would still cover the raw data's dates.. This PR makes it possible to force the date range of all the returned realizations to be the same - either by truncating or extending the range of dates so that all realizations have an equal date range and contain the same dates.The new behavior can be controlled through the
common_date_span
property of theResamplingOptions
options object that is now accepted byEnsembleSummaryProvider.get_vectors_df()
. Please see documentation forResamplingOptions
for more details.The
EnsembleSummaryProvider.dates()
function has also been extended with an additional argument,date_span
, that controls wheter the union or intersection of dates over realizations is returned.Note that this is a breaking change in the
EnsembleSummaryProvider
interface that affects the following member functions:EnsembleSummaryProvider.dates()
EnsembleSummaryProvider.get_vectors_df()
As an example of code changes needed in the client code, consider the case where we are requesting lazy resampling of two vectors, for all realizations, with a
MONTHLY
resampling frequency. Currently this would be accomplished like this:With the revised interface, the client code would have to be changed to something like this:
Contributor checklist
CHANGELOG.md
, and added it if should be communicated there.