Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

M2U:Introduce external variables (exog) to assist in prediction #45

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

oneJue
Copy link
Collaborator

@oneJue oneJue commented Jan 26, 2025

This change splits a dataset into target variables and external variables (exog) based on the target_channel configuration. The target_channel determines which columns are treated as target variables.

  • Integers (positive or negative) can be used to specify a single column index.
  • Lists or tuples containing two integers (e.g., [2, 4]) can be used to select a range of columns.
  • If target_channel is set to None, all columns are considered target variables, and the exog part will be set None.

Example:

Given the following configuration:

--strategy-args {"horizon":24,"target_channel":[0,-1,[0,3]]}

In this case:

  • target_channel = [0, -1]: Selects the first column (index 0) and the last column (negative index represents counting from the end).
  • target_channel = [0, 3]: Selects columns from index 0 to 2 (exclusive of index 3), i.e., the first three columns are used as target variables.

The dataset will be split accordingly into target variables and exog, allowing for efficient time-series prediction tasks.

qiu69 and others added 3 commits January 26, 2025 22:18
Signed-off-by: oneJue <501247613@qq.com>
Signed-off-by: oneJue <501247613@qq.com>
@@ -204,7 +206,7 @@ def _padding_time_stamp_mark(
padding_mark = get_time_mark(whole_time_stamp, 1, self.config.freq)
return padding_mark

def validate(self, valid_data_loader, criterion):
def validate(self, valid_data_loader, series_dim, criterion):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though this file is not yet fully refactored, we may polish this file step by step: Please add argument types and docstrings for this method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@@ -246,6 +257,9 @@ def forecast_fit(
:param train_ratio_in_tv: Represents the splitting ratio of the training set validation set. If it is equal to 1, it means that the validation set is not partitioned.
:return: The fitted model object.
"""
series_dim = covariates["exog"].shape[1]
if "exog" in covariates:
train_valid_data = pd.concat([train_valid_data, covariates["exog"]], axis=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before checking "exog" in covariates, covariates["exog"] may raise errors in line 260, moreover, this variable is exog_dim rather than series_dim, maybe we should do it lile this:

exog_dim = -1  # maybe None is better
if "exog" in covariates:
    exog_dim = covariates["exog"].shape[1]
    ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@@ -50,6 +52,7 @@ def _execute(
) -> List:
model = model_factory()

target_channel = self.strategy_config.get("target_channel", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to use self._get_scalar_config_value to enable specific configurations for each sample/series.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

train_valid_data, target_channel
)
target_test_data, exog_data = split_channel(test_data, target_channel)
covariate = {"exog": exog_data}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the exog_data in line 70 overwrites that inl ine 66, and it does not seem to be correct to pass in exog in test series to fit method
And it is better to name this variable covariates

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

fit_method(train_valid_data, train_ratio_in_tv=train_ratio_in_tv)
fit_method(
target_train_valid_data, covariate, train_ratio_in_tv=train_ratio_in_tv
)
end_fit_time = time.time()
predicted = model.forecast(horizon, train_valid_data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is necessary to pass in the exog data in test series to the forecast method, when rolling forecasting is involved within the model (e.g. output_chunk_length is 10 and horizon is 20)?
This is part of the complexity of covariates I mentioned before, please at least add a check to raise an error in forecast and batch_forecast methods with proper message when exog is enabled during training and horizon != output_chunk_length during forecast, and leave an issue tracking the problem of how covariates should be passed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
Currently, when we use the horizon argument its value must be equal to output_chunk_length

:param win_size: The size of each window.
:return: A batch of covariates.
"""
covariates_batch = self.covariates.copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid potential bugs in the future, it would be better to initialize covariates_batch with an empty dict, to avoid any unwanted reference to values in self.covariates

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

ts_benchmark/evaluation/strategy/rolling_forecast.py Outdated Show resolved Hide resolved
ts_benchmark/evaluation/strategy/rolling_forecast.py Outdated Show resolved Hide resolved
ts_benchmark/models/model_base.py Outdated Show resolved Hide resolved
:param win_size: The size of each window.
:return: A batch of covariates.
"""
covariates_batch = self.covariates.copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if self.covariates is None?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

train, rest = split_time(series, index)
test = split_time(rest, horizon)[0].iloc[
:, : target_train_valid_data.shape[-1]
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not using split_channels here, as the current implementation does not take target_channels into account?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
we now use split_channels here and pass covariates in forecast


Example 3:
target_channel = None
- Selects all columns as target columns, and the exog DataFrame is empty.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these examples are part of the description to "target_channels", they should have one more indentation to align with "It can include:", otherwise, please put these examples before any "param" directives.
Moreoever, please check the compiled documentation to see if the output is as expected.

btw, another way to include examples in the docstring is to use doctests:

Examples:

>>> import pandas as pd
>>> import numpy as np
>>> split_channel(pd.DataFrame(np.zeros((5, 6))), target_channel=[1, 3]).shape  # selects columns 1 and 3
(5, 2)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
Now if target_channel is None, all columns are treated as target columns (exog becomes None).

ts_benchmark/utils/data_processing.py Outdated Show resolved Hide resolved
@@ -46,7 +47,7 @@ class ModelBase(metaclass=abc.ABCMeta):

@abc.abstractmethod
def forecast_fit(
self, train_data: pd.DataFrame, *, train_ratio_in_tv: float = 1.0, **kwargs
self, train_data: pd.DataFrame, *,covariates: Optional[Dict] = None, train_ratio_in_tv: float = 1.0, **kwargs
) -> "ModelBase":
"""
Fit a model on time series data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add documentation of the covariates parameter

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

:, exog_columns
] # This works even if exog_columns is an empty list

return target_df, exog_df
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused with our current model protocol now, and it is hard to determine if our implementation follows the same protocol, or relies on certain inter-class dependencies.
NOTE that any coding before the protocol is determined is not acceptable, in this case, we should first determine how to interpret the 'covariates' parameter, currently we seems to assume that:

  • if "exog" not in covariates, exog is ignored;
  • if "exog" in covariates, it must be a DataFrame (i.e. cannot be None) with the correct time index
  • an uncommon usage of the model is to pass in an empty dataframe with the correct time index to disable exog, which should be clearly stated in the documentation;

If the these assumptions are part of our model protocol, please add them into the description of the covariates parameter in ModelBase, and check if our implementation strictly follows this protocol.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@@ -46,7 +47,7 @@ class ModelBase(metaclass=abc.ABCMeta):

@abc.abstractmethod
def forecast_fit(
self, train_data: pd.DataFrame, *, train_ratio_in_tv: float = 1.0, **kwargs
self, train_data: pd.DataFrame, *,covariates: Optional[Dict] = None, train_ratio_in_tv: float = 1.0, **kwargs
) -> "ModelBase":
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to make a flexible and unified model API, it is better to pass in covariates not only in forecast_fit, but also in forecast and batch_forecast, I think the current forecast and batch_forecast assumes the series argument already contains covariates, which is not flexible enough

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    def forecast_fit(
        self,
        train_valid_data: pd.DataFrame,
        covariates: Optional[Dict],
        train_ratio_in_tv: float,
    ) -> "ModelBase":
    def forecast(
        self, horizon: int, covariates: Optional[Dict], train: pd.DataFrame
    ) -> np.ndarray:
    def batch_forecast(
        self, horizon: int, batch_maker: BatchMaker, **kwargs
    ) -> np.ndarray:

In forecast_fit and forecast, I directly passed the covariates. In batch_forecast, both the series and covariates are passed as part of the batch_maker.

@luckiezhou luckiezhou assigned luckiezhou and oneJue and unassigned luckiezhou Jan 27, 2025
oneJue and others added 2 commits January 30, 2025 20:03
Signed-off-by: oneJue <501247613@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants