Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add notebook on the cause of miscalibration #7

Merged
merged 30 commits into from
Aug 22, 2024

Conversation

glemaitre
Copy link
Member

@glemaitre glemaitre commented Aug 14, 2024

Notebook on the cause of miscalibration:

  • Show examples in a 2D features space :
    • Mis-specified model: probabilistic XOR with non-zero probability for each class everywhere.
  • Causes of miscalibration:
    • underfitting / mis-specified model (model class is not expressive enough, hyperparameters are too constraining, extra feature engineering is required for a given choice of the model class, regularization too strong, stopped the optimizer too early / failed to converge)
    • overfitting: not enough regularization (or too much)
    • use of class weights / undersampling in imbalanced classification problems.
    • covariate distribution shift

@GaelVaroquaux
Copy link

GaelVaroquaux commented Aug 14, 2024 via email

@glemaitre
Copy link
Member Author

Thanks @GaelVaroquaux

@glemaitre
Copy link
Member Author

Distribution shift is also a frequent cause of miscalibration

@GaelVaroquaux One tiny question to get clearer insights: when it comes distribution shift, could we think of it similarly to the resampling effect meaning. As a concrete example, during a medical trial we learn on a controlled distribution where the proportion on the target is controlled in some way but when we are at predict time, then the class proportion is really different meaning that we are not calibrated anymore.

Is it this effect or you can have additional effect on the data X itself and have a feature distribution shift as well. In this case, I would expect that the model learnt is not valid anymore.

@GaelVaroquaux
Copy link

GaelVaroquaux commented Aug 15, 2024 via email

@glemaitre
Copy link
Member Author

Thanks this is clear.

@glemaitre glemaitre marked this pull request as draft August 19, 2024 08:05
@glemaitre glemaitre marked this pull request as ready for review August 21, 2024 12:58
@glemaitre glemaitre merged commit 82a7993 into probabl-ai:main Aug 22, 2024
3 checks passed
@glemaitre
Copy link
Member Author

Merging for the moment. We will make some PRs to improve the notebooks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants