-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the computing risk incidence loss to remove a bias #17
Conversation
@juAlberge I just pushed 44df4cf to implement more directly the weighting scheme of your draft manuscript but the example is still completely off. |
I broke some tests, let me fix this but I think this won't fix the real problem. |
I pushed a commit to restore the implementation of This is not the same when t goes to infinity. The former has a value that is strictly below 1.0 but the latter goes to 1.0 (assuming strictly positive hazards at infinity). This means that this PR will not pass the new tests introduced in #18 as such. I am not sure yet it the estimator introduced in this PR makes sense for practitioners. |
I have merged Here are the important plots (both without and with censoring): As explained in the above analysis, the cause-specific CDFs estimated by this branch go to 1.0 (with some estimation noise) when t goes to infinity while the competing risks CIFs found by numerical integration of the true hazard functions or by the Aalen Johansen estimator go to respective fractions of each event type. |
Looking at the plots again, I have the feeling that the estimated CDF might still be bad, for event 1 in particular, the non-concave shape seems wrong for a Weibull distribution with shape parameter below 1. |
I think we can close this PR @juAlberge. I don't think we want to ever estimate the CDFs in hazardous. |
In the example there is still a bias, so there might be a bug, we need to investigate.
EDIT (Olivier): the bias goes away when increasing the number of trees (
n_iter
) withn_samples
.This PR is actually an estimator for the cause-specific cumulative density functions for each event type. It estimates the cumulative incidence if it were possible to remove the competing event types which is rarely the case in practice (e.g. remove all other causes of death to study cancer incidence). I don't think this is what we want to estimate in practice. See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5557056/ for instance.