Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the computing risk incidence loss to remove a bias #17

Closed
wants to merge 8 commits into from

Conversation

juAlberge
Copy link
Collaborator

@juAlberge juAlberge commented Oct 31, 2023

In the example there is still a bias, so there might be a bug, we need to investigate.

EDIT (Olivier): the bias goes away when increasing the number of trees (n_iter) with n_samples.

This PR is actually an estimator for the cause-specific cumulative density functions for each event type. It estimates the cumulative incidence if it were possible to remove the competing event types which is rarely the case in practice (e.g. remove all other causes of death to study cancer incidence). I don't think this is what we want to estimate in practice. See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5557056/ for instance.

@ogrisel
Copy link
Contributor

ogrisel commented Nov 2, 2023

@juAlberge I just pushed 44df4cf to implement more directly the weighting scheme of your draft manuscript but the example is still completely off.

@ogrisel
Copy link
Contributor

ogrisel commented Nov 2, 2023

I broke some tests, let me fix this but I think this won't fix the real problem.

@ogrisel
Copy link
Contributor

ogrisel commented Nov 3, 2023

I pushed a commit to restore the implementation of IPCWEstimator from @juAlberge's original push. Note however that the example does not work as this estimator is no longer a competing risks cumulative incidence estimator (as AJ is for instance), but instead an estimator of the cumulative density function of each event distribution.

This is not the same when t goes to infinity. The former has a value that is strictly below 1.0 but the latter goes to 1.0 (assuming strictly positive hazards at infinity).

This means that this PR will not pass the new tests introduced in #18 as such.

I am not sure yet it the estimator introduced in this PR makes sense for practitioners.

@ogrisel
Copy link
Contributor

ogrisel commented Nov 4, 2023

I have merged main into this branch to trigger a doc build preview with the code of this branch. Here is the behavior of the marginal estimation example:

https://pull-request-17--hazardous-doc.netlify.app/auto_examples/plot_marginal_cumulative_incidence_estimation#sphx-glr-auto-examples-plot-marginal-cumulative-incidence-estimation-py

Here are the important plots (both without and with censoring):

image

image

As explained in the above analysis, the cause-specific CDFs estimated by this branch go to 1.0 (with some estimation noise) when t goes to infinity while the competing risks CIFs found by numerical integration of the true hazard functions or by the Aalen Johansen estimator go to respective fractions of each event type.

@ogrisel
Copy link
Contributor

ogrisel commented Nov 6, 2023

Looking at the plots again, I have the feeling that the estimated CDF might still be bad, for event 1 in particular, the non-concave shape seems wrong for a Weibull distribution with shape parameter below 1.

@ogrisel
Copy link
Contributor

ogrisel commented Nov 8, 2023

I think we can close this PR @juAlberge. I don't think we want to ever estimate the CDFs in hazardous.

@juAlberge juAlberge closed this Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants