M2U:Introduce external variables (exog) to assist in prediction #45

oneJue · 2025-01-26T14:41:48Z

This change splits a dataset into target variables and external variables (exog) based on the target_channel configuration. The target_channel determines which columns are treated as target variables.

Integers (positive or negative) can be used to specify a single column index.
Lists or tuples containing two integers (e.g., [2, 4]) can be used to select a range of columns.
If target_channel is set to None, all columns are considered target variables, and the exog part will be set None.

Example:

Given the following configuration:

--strategy-args {"horizon":24,"target_channel":[0,-1,[0,3]]}

In this case:

target_channel = [0, -1]: Selects the first column (index 0) and the last column (negative index represents counting from the end).
target_channel = [0, 3]: Selects columns from index 0 to 2 (exclusive of index 3), i.e., the first three columns are used as target variables.

The dataset will be split accordingly into target variables and exog, allowing for efficient time-series prediction tasks.

Signed-off-by: oneJue <501247613@qq.com>

ts_benchmark/baselines/time_series_library/adapters_for_transformers.py

luckiezhou · 2025-01-27T14:46:34Z

ts_benchmark/baselines/time_series_library/adapters_for_transformers.py

@@ -246,6 +257,9 @@ def forecast_fit(
        :param train_ratio_in_tv: Represents the splitting ratio of the training set validation set. If it is equal to 1, it means that the validation set is not partitioned.
        :return: The fitted model object.
        """
+        series_dim = covariates["exog"].shape[1]
+        if "exog" in covariates:
+            train_valid_data = pd.concat([train_valid_data, covariates["exog"]], axis=1)


before checking "exog" in covariates, covariates["exog"] may raise errors in line 260, moreover, this variable is exog_dim rather than series_dim, maybe we should do it lile this:

exog_dim = -1 # maybe None is better if "exog" in covariates: exog_dim = covariates["exog"].shape[1] ...

covariates["exog"] may still raise exception when "exog" not in covariates, please use "exog" not in covariates instead (if we do not allow covariates["exog"] to be None)

Thank you for your thoughtful feedback.

I update the implementation to allow covariates["exog"] to be None

In all strategy implementations, I explicitly ensure the covariates dictionary always contains the "exog" key.
Thus, the existing logic is exception-safe.
Would it be better to use the following code?

exog_dim = -1 exog_data = covariates.get("exog") if exog_data is not None: exog_dim = exog_data.shape[-1] train_valid_data = pd.concat([train_valid_data, exog_data], axis=1)

ts_benchmark/evaluation/strategy/fixed_forecast.py

luckiezhou · 2025-01-27T15:44:56Z

ts_benchmark/evaluation/strategy/fixed_forecast.py

-        fit_method(train_valid_data, train_ratio_in_tv=train_ratio_in_tv)
+        fit_method(
+            target_train_valid_data, covariate, train_ratio_in_tv=train_ratio_in_tv
+        )
        end_fit_time = time.time()
        predicted = model.forecast(horizon, train_valid_data)


I think it is necessary to pass in the exog data in test series to the forecast method, when rolling forecasting is involved within the model (e.g. output_chunk_length is 10 and horizon is 20)?
This is part of the complexity of covariates I mentioned before, please at least add a check to raise an error in forecast and batch_forecast methods with proper message when exog is enabled during training and horizon != output_chunk_length during forecast, and leave an issue tracking the problem of how covariates should be passed.

Currently, when we use the horizon argument its value must be equal to output_chunk_length

by 'horizon' I mean the 'horizon' parameter in strategy_args, not the 'horizon' parameter for tslib models specifically. These two parameters can be different in custom configs, so I think this check is still necessary.

if exog_dim != -1 and horizon != self.config.output_chunk_length: raise ValueError( f"Error: 'exog' is enabled during training, but horizon ({horizon}) != output_chunk_length ({self.config.output_chunk_length}) during forecast." )

I understand. It has been modified.

ts_benchmark/evaluation/strategy/rolling_forecast.py

ts_benchmark/models/model_base.py

luckiezhou · 2025-01-27T17:26:10Z

ts_benchmark/baselines/time_series_library/adapters_for_transformers.py

@@ -237,7 +248,7 @@ def validate(self, valid_data_loader, criterion):
        return total_loss

    def forecast_fit(
-        self, train_valid_data: pd.DataFrame, train_ratio_in_tv: float
+        self, train_valid_data: pd.DataFrame, covariates: dict, train_ratio_in_tv: float


Note that the type of covariates is Optional[Dict] = None, please update the method prototype and handle the None case

If covariates can be None, the first thing to do in forecast_fit should be

if covariates is None: covariates = {}

btw, do we need default values for covariates and train_ratio_in_tv, considering that we may take the baselines as an algorithm library? @qiu69

In all strategy implementations, I explicitly ensure the covariates can not be None,
And I updata all Optional[Dict] = None to be dict

luckiezhou · 2025-01-27T17:30:50Z

ts_benchmark/evaluation/strategy/rolling_forecast.py

+        :param win_size: The size of each window.
+        :return: A batch of covariates.
+        """
+        covariates_batch = self.covariates.copy()


what if self.covariates is None?

the problem still exists because self.covariates["exog"] raises errors when self.covariates is None

In all strategy implementations, I explicitly ensure the covariates can not be None.
Or adopt the following code?

covariates_batch = {} if self.covariates is not None and self.covariates.get("exog") is not None: covariates_batch["exog"] = self._make_batch_data( self.covariates["exog"], index_list, win_size ) else: covariates_batch["exog"] = None return covariates_batch

luckiezhou · 2025-01-27T17:43:34Z

ts_benchmark/evaluation/strategy/rolling_forecast.py

+            train, rest = split_time(series, index)
+            test = split_time(rest, horizon)[0].iloc[
+                :, : target_train_valid_data.shape[-1]
+            ]


why not using split_channels here, as the current implementation does not take target_channels into account?

we now use split_channels here and pass covariates in forecast

What I expect is

test, _ = split_channels(split_time(rest, horizon)[0], target_channels)

here, or do I misunderstand your intentions?

Yes, I was wrong. I have corrected it

luckiezhou · 2025-01-27T17:56:26Z

ts_benchmark/utils/data_processing.py

+
+    Example 3:
+        target_channel = None
+        - Selects all columns as target columns, and the exog DataFrame is empty.


if these examples are part of the description to "target_channels", they should have one more indentation to align with "It can include:", otherwise, please put these examples before any "param" directives.
Moreoever, please check the compiled documentation to see if the output is as expected.

btw, another way to include examples in the docstring is to use doctests:

Examples: >>> import pandas as pd >>> import numpy as np >>> split_channel(pd.DataFrame(np.zeros((5, 6))), target_channel=[1, 3]).shape # selects columns 1 and 3 (5, 2)

Now if target_channel is None, all columns are treated as target columns (exog becomes None).

have you tried to compile this docstring and see if the layout is as expected? because I'm not sure how to indent and add blank lines in this case.

Now if target_channel is None, all columns are treated as target columns (exog becomes None).

what is this comment replying to?

have you tried to compile this docstring and see if the layout is as expected? because I'm not sure how to indent and add blank lines in this case.

I’ve tested it, and everything works as expected.

Now if is , all columns are treated as target columns (exog becomes ).target_channel``None``None

what is this comment replying to?

I updated the implementation to allow the covariable ["exog"] to be None, the previous implementation was to pass an empty data frame

luckiezhou · 2025-01-27T17:57:21Z

ts_benchmark/utils/data_processing.py

+    :param df: The input DataFrame to be split.
+    :param target_channel: Configuration for selecting target columns.
+        It can include:
+        - Integers (positive or negative) representing single column indices.


there should be a blank line before and after a list, the same problem occurs in the "Examples" below

Now I've added a blank line before and after each list, including the previous comment.

luckiezhou · 2025-01-27T18:09:54Z

ts_benchmark/models/model_base.py

@@ -46,7 +47,7 @@ class ModelBase(metaclass=abc.ABCMeta):

    @abc.abstractmethod
    def forecast_fit(
-        self, train_data: pd.DataFrame, *, train_ratio_in_tv: float = 1.0, **kwargs
+        self, train_data: pd.DataFrame, *,covariates: Optional[Dict] = None, train_ratio_in_tv: float = 1.0, **kwargs
    ) -> "ModelBase":
        """
        Fit a model on time series data


please add documentation of the covariates parameter

According to the previous implementation, I believe covariates['exog'] cannot be None, or have you updated the implementation to allow it to be None?

Yes.I updated the implementation to allow it to be None

ts_benchmark/utils/data_processing.py

luckiezhou · 2025-01-27T18:22:03Z

ts_benchmark/models/model_base.py

@@ -46,7 +47,7 @@ class ModelBase(metaclass=abc.ABCMeta):

    @abc.abstractmethod
    def forecast_fit(
-        self, train_data: pd.DataFrame, *, train_ratio_in_tv: float = 1.0, **kwargs
+        self, train_data: pd.DataFrame, *,covariates: Optional[Dict] = None, train_ratio_in_tv: float = 1.0, **kwargs
    ) -> "ModelBase":
        """


In order to make a flexible and unified model API, it is better to pass in covariates not only in forecast_fit, but also in forecast and batch_forecast, I think the current forecast and batch_forecast assumes the series argument already contains covariates, which is not flexible enough

def forecast_fit( self, train_valid_data: pd.DataFrame, covariates: Optional[Dict], train_ratio_in_tv: float, ) -> "ModelBase":

def forecast( self, horizon: int, covariates: Optional[Dict], train: pd.DataFrame ) -> np.ndarray:

def batch_forecast( self, horizon: int, batch_maker: BatchMaker, **kwargs ) -> np.ndarray:

In forecast_fit and forecast, I directly passed the covariates. In batch_forecast, both the series and covariates are passed as part of the batch_maker.

I suggest to change the order of the parameters in forecast to horizon, series, covariates to align with the order in forecast_fit, and please don't rename 'series' to 'train' in forecast, as this series is mainly used for inferencing.

Signed-off-by: oneJue <501247613@qq.com>

qiu69 and others added 3 commits January 26, 2025 22:18

m2u

c1f7129

m2u

93d2e99

Signed-off-by: oneJue <501247613@qq.com>

m2u

7b73d80

Signed-off-by: oneJue <501247613@qq.com>

luckiezhou reviewed Jan 27, 2025

View reviewed changes

luckiezhou assigned luckiezhou and oneJue and unassigned luckiezhou Jan 27, 2025

oneJue added 3 commits January 30, 2025 20:03

m2u

2140ebf

Signed-off-by: oneJue <501247613@qq.com>

m2u

fb32430

Signed-off-by: oneJue <501247613@qq.com>

m2u

32332a3

Signed-off-by: oneJue <501247613@qq.com>

oneJue force-pushed the master branch from 27294ce to 32332a3 Compare February 1, 2025 11:29

M2U:Introduce external variables (exog) to assist in prediction #45

Are you sure you want to change the base?

M2U:Introduce external variables (exog) to assist in prediction #45

Conversation

oneJue commented Jan 26, 2025 • edited Loading

Example:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oneJue Feb 1, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luckiezhou Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

oneJue Feb 1, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oneJue commented Jan 26, 2025 •

edited

Loading

oneJue Feb 1, 2025 •

edited

Loading

luckiezhou Jan 31, 2025 •

edited

Loading

oneJue Feb 1, 2025 •

edited

Loading