Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IS and OOS returns for matrix M, or just IS returns? #7

Open
tre-blu opened this issue May 2, 2022 · 1 comment
Open

IS and OOS returns for matrix M, or just IS returns? #7

tre-blu opened this issue May 2, 2022 · 1 comment

Comments

@tre-blu
Copy link

tre-blu commented May 2, 2022

I am curious if someone could clarify what type of source data is used to implement the PBO algorithm: is the input M matrix purely the returns data derived from the N trials obtained by testing various T model parameter configurations IS, or does it also include the respective OOS performance of each parameter configuration T?

After first reading the MLDP paper, I had assumed it was the latter, that we also need to input related OOS returns data since we are e.g. comparing the optimal shape IS to the median OOS value in each CSCV combination. Additionally, Figure 1 (below) from the paper shows the CVCV process with M partitioned into IS and OOS sections.

However, when attempting to implement the algorithm using this and also R libraries, I see that only a single matrix M of returns data is input.

Also note that in the paper, they never speak of IS/OOS explicitly in describing the construction of M:
"First, we form a matrix M by collecting the performance series from
the N trials. In particular, each column n = 1, . . . , N represents a vector of
profits and losses over t = 1, . . . , T observations associated with a particular
model configuration tried by the researcher"

Am I missing something? Perhaps the CSCV process derives 'synthetic' OOS data using the IS returns by means of sampling under IID assumptions? Or, is that we do need to include both IS and OOS returns data and are supposed to e.g. join matrices of IS and OOS data into a symmetrical matrix/df?

@RSv618
Copy link

RSv618 commented Apr 24, 2023

The Probability of Backtest Overfitting (PBO) algorithm uses a single matrix M of returns data as input. This matrix M is constructed by collecting the performance series from the N trials obtained by testing various T model parameter configurations in-sample (IS). The matrix M does not include out-of-sample (OOS) performance data.

The Combinatorially Symmetric Cross-Validation (CSCV) process used to estimate the PBO divides the in-sample data into multiple subsets and uses some of these subsets as “pseudo” out-of-sample data to estimate the out-of-sample performance. This is done by iteratively selecting one subset as the “pseudo” out-of-sample data and using the remaining subsets as the in-sample data to calibrate the model. The calibrated model is then applied to the “pseudo” out-of-sample data to estimate its out-of-sample performance. This process is repeated for each subset, and the results are combined to estimate the PBO.

So, to answer your question, you only need to provide a single matrix M of in-sample returns data as input to the PBO algorithm. The CSCV process will use this data to estimate the out-of-sample performance and calculate the PBO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants