This repository contains the code to reproduce the benchmark of the paper "Teaching Models To Survive: Proper Scoring Rule and Stochastic Optimization with Competing Risks" by Alberge et al.
This project is an archive and won't be maintained.
Installing this project will allow you to run the following models (table A):
Name | Competing risks | Proper loss | Implementation | Reference |
---|---|---|---|---|
MultiIncidence | ✔️ | ✔️ | ours | Alberge et al. |
SurvTRACE | ✔️ | ours, based on github.com/RyanWangZf/SurvTRACE | Wang and Sun (2022) | |
DeepHit | ✔️ | github.com/havakv/pycox | Lee et al. (2018) | |
Random Survival Forests | ✔️ | scikit-survival.readthedocs.io for survival and randomforestsrc.org for competing risks |
Ishwaran et al. (2014) | |
Fine and Gray | ✔️ | cran.r-project.org/package=cmprsk | Fine and Gray (1999) | |
Aalen-Johansen | ✔️ | ours, based on lifelines.readthedocs.io | Aalen et al. (2008) | |
PCHazard | github.com/havakv/pycox | Kvamme and Borgan (2019b) |
We also benchmark the following models, by adding some snippets to authors' code on our forked version (table B):
Name | Competing risks | Proper loss | Implementation | Reference |
---|---|---|---|---|
DSM | ✔️ | autonlab.github.io/DeepSurvivalMachines | Nagpal et al. (2021) | |
DeSurv | ✔️ | github.com/djdanks/DeSurv | Danks and Yau (2022) | |
Han et al. | github.com/rajesh-lab/Inverse-Weighted-Survival-Games | Han et al. (2021) | ||
Sumo-Net | ✔️ | github.com/MrHuff/Sumo-Net | Rindt et al. (2022) | |
DQS | ✔️ | ibm.github.io/dqs | Yanagisawa (2023) |
We used the following datasets (table C):
Name | Competing risks | Source | Need a license |
---|---|---|---|
synthetic dataset | ✔️ | ours | |
Metabric | pycox | ||
Support | pycox | ||
SEER | ✔️ | NIH | ✔️ |
See the setup section to learn how to download SEER.
git clone [email protected]:soda-inria/survival_cr_bench.git
cd survival_cr_bench/
Create and activate an environment, e.g.:
python -m venv <your_env_name>
source <your_env_name>/bin/activate
Then perform the local installation:
pip install -e .
To use the SEER dataset in the benchmarks, you first have to make a request and be approved by the NIH before downloading it. Here is a tutorial.
Note that the waiting period can be up to several days.
We provide the best parameters that we found during our hyper parameters search. To re-run this search operation, use the following:
cd benchmark
Then in a python shell:
from hyper_parameter_search import search_all_dataset_params
search_all_dataset_params(dataset_name="seer", model_name="gbmi")
See benchmark/_dataset.py
and benchmark/_model.py
to see the options.
Running this function will create two files in the folder benchmark/best_hyper_parameters/<model_name>/<dataset_name>/<dataset_params>
.
- the best hyper-parameters of the cross-validated model (
best_params.json
) - the parameters used to generate the dataset (
dataset_params.json
).
cd benchmark/
from evaluate import evaluate_all_models
evaluate_all_models()
This will fit all models present in the best_hyper_parameters
folder with their best hyper parameters, for each random seed.
Then, each fitted model is evaluated against the test set to compute metrics. The metrics are written at benchmark/scores/raw/<model_name>/<dataset_name>.json
for each seed.
Finally, for each model and each dataset, the results are aggregated seed-wise and written at benchmark/scores/agg/<model_name>/<dataset_name>.json
.
These aggregated metrics will be used to plot figures.
To run models from the table B, you have to go to the specific submodule, e.g. for dqs
cd dqs
Then, read the corresponding README of these projects to run their benchmark.
Running any of these models will create results at benchmark/scores/raw/<model_name>/<dataset_name.json>
.
To aggregate them, run:
cd benchmark/
from evaluate import standalone_aggregate
standalone_aggregate("dqs", "metabric")
This will create an entry at the respective benchmark/scores/agg/<model_name>/<dataset_name.json>
, and will allow you to plot the results with functions from the benchmark/display
directory.
As we already provide the results from our benchmarks, you don't have to run them all in order to reproduce the figures.
Each file corresponds to a figure introduced in the paper, with its number in the file name. Running a display file will create a .png file at the root of this repository.
python benchmark/display/display_06_brier_score.py