Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

M2U:Introduce external variables (exog) to assist in prediction #45

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

oneJue
Copy link
Collaborator

@oneJue oneJue commented Jan 26, 2025

This change splits a dataset into target variables and external variables (exog) based on the target_channel configuration. The target_channel determines which columns are treated as target variables.

  • Integers (positive or negative) can be used to specify a single column index.
  • Lists or tuples containing two integers (e.g., [2, 4]) can be used to select a range of columns.
  • If target_channel is set to None, all columns are considered target variables, and the exog part will be set None.

Example:

Given the following configuration:

--strategy-args {"horizon":24,"target_channel":[0,-1,[0,3]]}

In this case:

  • target_channel = [0, -1]: Selects the first column (index 0) and the last column (negative index represents counting from the end).
  • target_channel = [0, 3]: Selects columns from index 0 to 2 (exclusive of index 3), i.e., the first three columns are used as target variables.

The dataset will be split accordingly into target variables and exog, allowing for efficient time-series prediction tasks.

qiu69 and others added 3 commits January 26, 2025 22:18
Signed-off-by: oneJue <501247613@qq.com>
Signed-off-by: oneJue <501247613@qq.com>
@@ -246,6 +257,9 @@ def forecast_fit(
:param train_ratio_in_tv: Represents the splitting ratio of the training set validation set. If it is equal to 1, it means that the validation set is not partitioned.
:return: The fitted model object.
"""
series_dim = covariates["exog"].shape[1]
if "exog" in covariates:
train_valid_data = pd.concat([train_valid_data, covariates["exog"]], axis=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before checking "exog" in covariates, covariates["exog"] may raise errors in line 260, moreover, this variable is exog_dim rather than series_dim, maybe we should do it lile this:

exog_dim = -1  # maybe None is better
if "exog" in covariates:
    exog_dim = covariates["exog"].shape[1]
    ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

covariates["exog"] may still raise exception when "exog" not in covariates, please use "exog" not in covariates instead (if we do not allow covariates["exog"] to be None)

ts_benchmark/evaluation/strategy/fixed_forecast.py Outdated Show resolved Hide resolved
ts_benchmark/evaluation/strategy/fixed_forecast.py Outdated Show resolved Hide resolved
fit_method(train_valid_data, train_ratio_in_tv=train_ratio_in_tv)
fit_method(
target_train_valid_data, covariate, train_ratio_in_tv=train_ratio_in_tv
)
end_fit_time = time.time()
predicted = model.forecast(horizon, train_valid_data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is necessary to pass in the exog data in test series to the forecast method, when rolling forecasting is involved within the model (e.g. output_chunk_length is 10 and horizon is 20)?
This is part of the complexity of covariates I mentioned before, please at least add a check to raise an error in forecast and batch_forecast methods with proper message when exog is enabled during training and horizon != output_chunk_length during forecast, and leave an issue tracking the problem of how covariates should be passed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
Currently, when we use the horizon argument its value must be equal to output_chunk_length

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by 'horizon' I mean the 'horizon' parameter in strategy_args, not the 'horizon' parameter for tslib models specifically. These two parameters can be different in custom configs, so I think this check is still necessary.
image

ts_benchmark/evaluation/strategy/rolling_forecast.py Outdated Show resolved Hide resolved
ts_benchmark/evaluation/strategy/rolling_forecast.py Outdated Show resolved Hide resolved
ts_benchmark/evaluation/strategy/rolling_forecast.py Outdated Show resolved Hide resolved
ts_benchmark/models/model_base.py Outdated Show resolved Hide resolved
@@ -237,7 +248,7 @@ def validate(self, valid_data_loader, criterion):
return total_loss

def forecast_fit(
self, train_valid_data: pd.DataFrame, train_ratio_in_tv: float
self, train_valid_data: pd.DataFrame, covariates: dict, train_ratio_in_tv: float
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the type of covariates is Optional[Dict] = None, please update the method prototype and handle the None case

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If covariates can be None, the first thing to do in forecast_fit should be

if covariates is None:
    covariates = {}

btw, do we need default values for covariates and train_ratio_in_tv, considering that we may take the baselines as an algorithm library? @qiu69

:param win_size: The size of each window.
:return: A batch of covariates.
"""
covariates_batch = self.covariates.copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if self.covariates is None?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Collaborator

@luckiezhou luckiezhou Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem still exists because self.covariates["exog"] raises errors when self.covariates is None

train, rest = split_time(series, index)
test = split_time(rest, horizon)[0].iloc[
:, : target_train_valid_data.shape[-1]
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not using split_channels here, as the current implementation does not take target_channels into account?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
we now use split_channels here and pass covariates in forecast

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I expect is

test, _ = split_channels(split_time(rest, horizon)[0], target_channels)

here, or do I misunderstand your intentions?


Example 3:
target_channel = None
- Selects all columns as target columns, and the exog DataFrame is empty.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these examples are part of the description to "target_channels", they should have one more indentation to align with "It can include:", otherwise, please put these examples before any "param" directives.
Moreoever, please check the compiled documentation to see if the output is as expected.

btw, another way to include examples in the docstring is to use doctests:

Examples:

>>> import pandas as pd
>>> import numpy as np
>>> split_channel(pd.DataFrame(np.zeros((5, 6))), target_channel=[1, 3]).shape  # selects columns 1 and 3
(5, 2)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
Now if target_channel is None, all columns are treated as target columns (exog becomes None).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you tried to compile this docstring and see if the layout is as expected? because I'm not sure how to indent and add blank lines in this case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image Now if target_channel is None, all columns are treated as target columns (exog becomes None).

what is this comment replying to?

:param df: The input DataFrame to be split.
:param target_channel: Configuration for selecting target columns.
It can include:
- Integers (positive or negative) representing single column indices.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be a blank line before and after a list, the same problem occurs in the "Examples" below

@@ -46,7 +47,7 @@ class ModelBase(metaclass=abc.ABCMeta):

@abc.abstractmethod
def forecast_fit(
self, train_data: pd.DataFrame, *, train_ratio_in_tv: float = 1.0, **kwargs
self, train_data: pd.DataFrame, *,covariates: Optional[Dict] = None, train_ratio_in_tv: float = 1.0, **kwargs
) -> "ModelBase":
"""
Fit a model on time series data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add documentation of the covariates parameter

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the previous implementation, I believe covariates['exog'] cannot be None, or have you updated the implementation to allow it to be None?
image

ts_benchmark/utils/data_processing.py Show resolved Hide resolved
@@ -46,7 +47,7 @@ class ModelBase(metaclass=abc.ABCMeta):

@abc.abstractmethod
def forecast_fit(
self, train_data: pd.DataFrame, *, train_ratio_in_tv: float = 1.0, **kwargs
self, train_data: pd.DataFrame, *,covariates: Optional[Dict] = None, train_ratio_in_tv: float = 1.0, **kwargs
) -> "ModelBase":
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to make a flexible and unified model API, it is better to pass in covariates not only in forecast_fit, but also in forecast and batch_forecast, I think the current forecast and batch_forecast assumes the series argument already contains covariates, which is not flexible enough

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    def forecast_fit(
        self,
        train_valid_data: pd.DataFrame,
        covariates: Optional[Dict],
        train_ratio_in_tv: float,
    ) -> "ModelBase":
    def forecast(
        self, horizon: int, covariates: Optional[Dict], train: pd.DataFrame
    ) -> np.ndarray:
    def batch_forecast(
        self, horizon: int, batch_maker: BatchMaker, **kwargs
    ) -> np.ndarray:

In forecast_fit and forecast, I directly passed the covariates. In batch_forecast, both the series and covariates are passed as part of the batch_maker.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to change the order of the parameters in forecast to horizon, series, covariates to align with the order in forecast_fit, and please don't rename 'series' to 'train' in forecast, as this series is mainly used for inferencing.

@luckiezhou luckiezhou assigned luckiezhou and oneJue and unassigned luckiezhou Jan 27, 2025
oneJue and others added 2 commits January 30, 2025 20:03
Signed-off-by: oneJue <501247613@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants