-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
M2U:Introduce external variables (exog) to assist in prediction #45
base: master
Are you sure you want to change the base?
Conversation
@@ -204,7 +206,7 @@ def _padding_time_stamp_mark( | |||
padding_mark = get_time_mark(whole_time_stamp, 1, self.config.freq) | |||
return padding_mark | |||
|
|||
def validate(self, valid_data_loader, criterion): | |||
def validate(self, valid_data_loader, series_dim, criterion): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though this file is not yet fully refactored, we may polish this file step by step: Please add argument types and docstrings for this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -246,6 +257,9 @@ def forecast_fit( | |||
:param train_ratio_in_tv: Represents the splitting ratio of the training set validation set. If it is equal to 1, it means that the validation set is not partitioned. | |||
:return: The fitted model object. | |||
""" | |||
series_dim = covariates["exog"].shape[1] | |||
if "exog" in covariates: | |||
train_valid_data = pd.concat([train_valid_data, covariates["exog"]], axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before checking "exog" in covariates, covariates["exog"]
may raise errors in line 260, moreover, this variable is exog_dim
rather than series_dim
, maybe we should do it lile this:
exog_dim = -1 # maybe None is better
if "exog" in covariates:
exog_dim = covariates["exog"].shape[1]
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -50,6 +52,7 @@ def _execute( | |||
) -> List: | |||
model = model_factory() | |||
|
|||
target_channel = self.strategy_config.get("target_channel", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to use self._get_scalar_config_value
to enable specific configurations for each sample/series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
train_valid_data, target_channel | ||
) | ||
target_test_data, exog_data = split_channel(test_data, target_channel) | ||
covariate = {"exog": exog_data} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the exog_data in line 70 overwrites that inl ine 66, and it does not seem to be correct to pass in exog in test series to fit method
And it is better to name this variable covariates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fit_method(train_valid_data, train_ratio_in_tv=train_ratio_in_tv) | ||
fit_method( | ||
target_train_valid_data, covariate, train_ratio_in_tv=train_ratio_in_tv | ||
) | ||
end_fit_time = time.time() | ||
predicted = model.forecast(horizon, train_valid_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is necessary to pass in the exog data in test series to the forecast method, when rolling forecasting is involved within the model (e.g. output_chunk_length is 10 and horizon is 20)?
This is part of the complexity of covariates I mentioned before, please at least add a check to raise an error in forecast and batch_forecast methods with proper message when exog is enabled during training and horizon != output_chunk_length during forecast, and leave an issue tracking the problem of how covariates should be passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:param win_size: The size of each window. | ||
:return: A batch of covariates. | ||
""" | ||
covariates_batch = self.covariates.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid potential bugs in the future, it would be better to initialize covariates_batch with an empty dict, to avoid any unwanted reference to values in self.covariates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ts_benchmark/baselines/time_series_library/adapters_for_transformers.py
Outdated
Show resolved
Hide resolved
:param win_size: The size of each window. | ||
:return: A batch of covariates. | ||
""" | ||
covariates_batch = self.covariates.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if self.covariates is None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
train, rest = split_time(series, index) | ||
test = split_time(rest, horizon)[0].iloc[ | ||
:, : target_train_valid_data.shape[-1] | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not using split_channels here, as the current implementation does not take target_channels into account?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
Example 3: | ||
target_channel = None | ||
- Selects all columns as target columns, and the exog DataFrame is empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if these examples are part of the description to "target_channels", they should have one more indentation to align with "It can include:", otherwise, please put these examples before any "param" directives.
Moreoever, please check the compiled documentation to see if the output is as expected.
btw, another way to include examples in the docstring is to use doctests:
Examples:
>>> import pandas as pd
>>> import numpy as np
>>> split_channel(pd.DataFrame(np.zeros((5, 6))), target_channel=[1, 3]).shape # selects columns 1 and 3
(5, 2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -46,7 +47,7 @@ class ModelBase(metaclass=abc.ABCMeta): | |||
|
|||
@abc.abstractmethod | |||
def forecast_fit( | |||
self, train_data: pd.DataFrame, *, train_ratio_in_tv: float = 1.0, **kwargs | |||
self, train_data: pd.DataFrame, *,covariates: Optional[Dict] = None, train_ratio_in_tv: float = 1.0, **kwargs | |||
) -> "ModelBase": | |||
""" | |||
Fit a model on time series data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add documentation of the covariates parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:, exog_columns | ||
] # This works even if exog_columns is an empty list | ||
|
||
return target_df, exog_df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused with our current model protocol now, and it is hard to determine if our implementation follows the same protocol, or relies on certain inter-class dependencies.
NOTE that any coding before the protocol is determined is not acceptable, in this case, we should first determine how to interpret the 'covariates' parameter, currently we seems to assume that:
- if
"exog" not in covariates
, exog is ignored; - if
"exog" in covariates
, it must be a DataFrame (i.e. cannot be None) with the correct time index - an uncommon usage of the model is to pass in an empty dataframe with the correct time index to disable exog, which should be clearly stated in the documentation;
If the these assumptions are part of our model protocol, please add them into the description of the covariates parameter in ModelBase, and check if our implementation strictly follows this protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -46,7 +47,7 @@ class ModelBase(metaclass=abc.ABCMeta): | |||
|
|||
@abc.abstractmethod | |||
def forecast_fit( | |||
self, train_data: pd.DataFrame, *, train_ratio_in_tv: float = 1.0, **kwargs | |||
self, train_data: pd.DataFrame, *,covariates: Optional[Dict] = None, train_ratio_in_tv: float = 1.0, **kwargs | |||
) -> "ModelBase": | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to make a flexible and unified model API, it is better to pass in covariates
not only in forecast_fit, but also in forecast
and batch_forecast
, I think the current forecast
and batch_forecast
assumes the series
argument already contains covariates, which is not flexible enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def forecast_fit(
self,
train_valid_data: pd.DataFrame,
covariates: Optional[Dict],
train_ratio_in_tv: float,
) -> "ModelBase":
def forecast(
self, horizon: int, covariates: Optional[Dict], train: pd.DataFrame
) -> np.ndarray:
def batch_forecast(
self, horizon: int, batch_maker: BatchMaker, **kwargs
) -> np.ndarray:
In forecast_fit
and forecast
, I directly passed the covariates
. In batch_forecast
, both the series and covariates
are passed as part of the batch_maker
.
This change splits a dataset into target variables and external variables (exog) based on the
target_channel
configuration. Thetarget_channel
determines which columns are treated as target variables.target_channel
is set toNone
, all columns are considered target variables, and the exog part will be setNone
.Example:
Given the following configuration:
--strategy-args {"horizon":24,"target_channel":[0,-1,[0,3]]}
In this case:
target_channel = [0, -1]
: Selects the first column (index 0) and the last column (negative index represents counting from the end).target_channel = [0, 3]
: Selects columns from index 0 to 2 (exclusive of index 3), i.e., the first three columns are used as target variables.The dataset will be split accordingly into target variables and exog, allowing for efficient time-series prediction tasks.