-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
M2U:Introduce external variables (exog) to assist in prediction #45
base: master
Are you sure you want to change the base?
Conversation
ts_benchmark/baselines/time_series_library/adapters_for_transformers.py
Outdated
Show resolved
Hide resolved
@@ -246,6 +257,9 @@ def forecast_fit( | |||
:param train_ratio_in_tv: Represents the splitting ratio of the training set validation set. If it is equal to 1, it means that the validation set is not partitioned. | |||
:return: The fitted model object. | |||
""" | |||
series_dim = covariates["exog"].shape[1] | |||
if "exog" in covariates: | |||
train_valid_data = pd.concat([train_valid_data, covariates["exog"]], axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before checking "exog" in covariates, covariates["exog"]
may raise errors in line 260, moreover, this variable is exog_dim
rather than series_dim
, maybe we should do it lile this:
exog_dim = -1 # maybe None is better
if "exog" in covariates:
exog_dim = covariates["exog"].shape[1]
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
covariates["exog"]
may still raise exception when "exog" not in covariates, please use "exog" not in covariates
instead (if we do not allow covariates["exog"] to be None)
fit_method(train_valid_data, train_ratio_in_tv=train_ratio_in_tv) | ||
fit_method( | ||
target_train_valid_data, covariate, train_ratio_in_tv=train_ratio_in_tv | ||
) | ||
end_fit_time = time.time() | ||
predicted = model.forecast(horizon, train_valid_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is necessary to pass in the exog data in test series to the forecast method, when rolling forecasting is involved within the model (e.g. output_chunk_length is 10 and horizon is 20)?
This is part of the complexity of covariates I mentioned before, please at least add a check to raise an error in forecast and batch_forecast methods with proper message when exog is enabled during training and horizon != output_chunk_length during forecast, and leave an issue tracking the problem of how covariates should be passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -237,7 +248,7 @@ def validate(self, valid_data_loader, criterion): | |||
return total_loss | |||
|
|||
def forecast_fit( | |||
self, train_valid_data: pd.DataFrame, train_ratio_in_tv: float | |||
self, train_valid_data: pd.DataFrame, covariates: dict, train_ratio_in_tv: float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the type of covariates is Optional[Dict] = None
, please update the method prototype and handle the None case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If covariates can be None, the first thing to do in forecast_fit should be
if covariates is None:
covariates = {}
btw, do we need default values for covariates and train_ratio_in_tv, considering that we may take the baselines as an algorithm library? @qiu69
:param win_size: The size of each window. | ||
:return: A batch of covariates. | ||
""" | ||
covariates_batch = self.covariates.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if self.covariates is None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem still exists because self.covariates["exog"]
raises errors when self.covariates is None
train, rest = split_time(series, index) | ||
test = split_time(rest, horizon)[0].iloc[ | ||
:, : target_train_valid_data.shape[-1] | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not using split_channels here, as the current implementation does not take target_channels into account?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I expect is
test, _ = split_channels(split_time(rest, horizon)[0], target_channels)
here, or do I misunderstand your intentions?
|
||
Example 3: | ||
target_channel = None | ||
- Selects all columns as target columns, and the exog DataFrame is empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if these examples are part of the description to "target_channels", they should have one more indentation to align with "It can include:", otherwise, please put these examples before any "param" directives.
Moreoever, please check the compiled documentation to see if the output is as expected.
btw, another way to include examples in the docstring is to use doctests:
Examples:
>>> import pandas as pd
>>> import numpy as np
>>> split_channel(pd.DataFrame(np.zeros((5, 6))), target_channel=[1, 3]).shape # selects columns 1 and 3
(5, 2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you tried to compile this docstring and see if the layout is as expected? because I'm not sure how to indent and add blank lines in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:param df: The input DataFrame to be split. | ||
:param target_channel: Configuration for selecting target columns. | ||
It can include: | ||
- Integers (positive or negative) representing single column indices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be a blank line before and after a list, the same problem occurs in the "Examples" below
@@ -46,7 +47,7 @@ class ModelBase(metaclass=abc.ABCMeta): | |||
|
|||
@abc.abstractmethod | |||
def forecast_fit( | |||
self, train_data: pd.DataFrame, *, train_ratio_in_tv: float = 1.0, **kwargs | |||
self, train_data: pd.DataFrame, *,covariates: Optional[Dict] = None, train_ratio_in_tv: float = 1.0, **kwargs | |||
) -> "ModelBase": | |||
""" | |||
Fit a model on time series data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add documentation of the covariates parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -46,7 +47,7 @@ class ModelBase(metaclass=abc.ABCMeta): | |||
|
|||
@abc.abstractmethod | |||
def forecast_fit( | |||
self, train_data: pd.DataFrame, *, train_ratio_in_tv: float = 1.0, **kwargs | |||
self, train_data: pd.DataFrame, *,covariates: Optional[Dict] = None, train_ratio_in_tv: float = 1.0, **kwargs | |||
) -> "ModelBase": | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to make a flexible and unified model API, it is better to pass in covariates
not only in forecast_fit, but also in forecast
and batch_forecast
, I think the current forecast
and batch_forecast
assumes the series
argument already contains covariates, which is not flexible enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def forecast_fit(
self,
train_valid_data: pd.DataFrame,
covariates: Optional[Dict],
train_ratio_in_tv: float,
) -> "ModelBase":
def forecast(
self, horizon: int, covariates: Optional[Dict], train: pd.DataFrame
) -> np.ndarray:
def batch_forecast(
self, horizon: int, batch_maker: BatchMaker, **kwargs
) -> np.ndarray:
In forecast_fit
and forecast
, I directly passed the covariates
. In batch_forecast
, both the series and covariates
are passed as part of the batch_maker
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to change the order of the parameters in forecast to horizon, series, covariates
to align with the order in forecast_fit
, and please don't rename 'series' to 'train' in forecast, as this series is mainly used for inferencing.
This change splits a dataset into target variables and external variables (exog) based on the
target_channel
configuration. Thetarget_channel
determines which columns are treated as target variables.target_channel
is set toNone
, all columns are considered target variables, and the exog part will be setNone
.Example:
Given the following configuration:
--strategy-args {"horizon":24,"target_channel":[0,-1,[0,3]]}
In this case:
target_channel = [0, -1]
: Selects the first column (index 0) and the last column (negative index represents counting from the end).target_channel = [0, 3]
: Selects columns from index 0 to 2 (exclusive of index 3), i.e., the first three columns are used as target variables.The dataset will be split accordingly into target variables and exog, allowing for efficient time-series prediction tasks.