-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add notebook on the cause of miscalibration #7
Conversation
Distribution shift is also a frequent cause of miscalibration
…On Aug 14, 2024, 20:33, at 20:33, Guillaume Lemaitre ***@***.***> wrote:
Notebook on the cause of miscalibration:
- Show examples in a 2D features space :
- Mis-specified model: probabilistic XOR with non-zero probability for
each class everywhere.
- Causes of miscalibration:
- underfitting / mis-specified model (model class is not expressive
enough, hyperparameters are too constraining, extra feature engineering
is required for a given choice of the model class, regularization too
strong, stopped the optimizer too early / failed to converge)
- overfitting: not enough regularization (or too much)
- use of class weights / undersampling in imbalanced classification
problems.
You can view, comment on, or merge this pull request online at:
#7
-- Commit Summary --
* add notebook on the cause of miscalibration
-- File Changes --
A python_files/causes_miscalibration.py (61)
-- Patch Links --
https://github.com/probabl-ai/calibration-cost-sensitive-learning/pull/7.patch
https://github.com/probabl-ai/calibration-cost-sensitive-learning/pull/7.diff
--
Reply to this email directly or view it on GitHub:
#7
You are receiving this because you are subscribed to this thread.
Message ID:
***@***.***>
|
Thanks @GaelVaroquaux |
@GaelVaroquaux One tiny question to get clearer insights: when it comes distribution shift, could we think of it similarly to the resampling effect meaning. As a concrete example, during a medical trial we learn on a controlled distribution where the proportion on the target is controlled in some way but when we are at predict time, then the class proportion is really different meaning that we are not calibrated anymore. Is it this effect or you can have additional effect on the data |
Actual covariate shift.
Taking the stance that the model is not valid anymore is fine from a theoretical point of view, but often you don't have enough data to learn a new model. With Alex Perez-Lebel, we found that often recalibrating gives you a significant fraction of the boost you could get with a new model (under review)
…On Aug 15, 2024, 11:00, at 11:00, Guillaume Lemaitre ***@***.***> wrote:
> Distribution shift is also a frequent cause of miscalibration
@GaelVaroquaux One tiny question to get clearer insights: when it comes
distribution shift, could we think of it similarly to the resampling
effect meaning. As a concrete example, during a medical trial we learn
on a controlled distribution where the proportion on the target is
controlled in some way but when we are at predict time, then the class
proportion is really different meaning that we are not calibrated
anymore.
Is it this effect or you can have additional effect on the data `X`
itself and have a feature distribution shift as well. In this case, I
would expect that the model learnt is not valid anymore.
--
Reply to this email directly or view it on GitHub:
#7 (comment)
You are receiving this because you were mentioned.
Message ID:
***@***.***>
|
Thanks this is clear. |
Merging for the moment. We will make some PRs to improve the notebooks. |
Notebook on the cause of miscalibration: