add notebook on the cause of miscalibration #7

glemaitre · 2024-08-14T19:27:42Z

Notebook on the cause of miscalibration:

Show examples in a 2D features space :
- Mis-specified model: probabilistic XOR with non-zero probability for each class everywhere.
Causes of miscalibration:
- underfitting / mis-specified model (model class is not expressive enough, hyperparameters are too constraining, extra feature engineering is required for a given choice of the model class, regularization too strong, stopped the optimizer too early / failed to converge)
- overfitting: not enough regularization (or too much)
- use of class weights / undersampling in imbalanced classification problems.
- covariate distribution shift

GaelVaroquaux · 2024-08-14T19:48:54Z

Distribution shift is also a frequent cause of miscalibration

On Aug 14, 2024, 20:33, at 20:33, Guillaume Lemaitre ***@***.***> wrote: Notebook on the cause of miscalibration: - Show examples in a 2D features space : - Mis-specified model: probabilistic XOR with non-zero probability for each class everywhere. - Causes of miscalibration: - underfitting / mis-specified model (model class is not expressive enough, hyperparameters are too constraining, extra feature engineering is required for a given choice of the model class, regularization too strong, stopped the optimizer too early / failed to converge) - overfitting: not enough regularization (or too much) - use of class weights / undersampling in imbalanced classification problems. You can view, comment on, or merge this pull request online at: #7 -- Commit Summary -- * add notebook on the cause of miscalibration -- File Changes -- A python_files/causes_miscalibration.py (61) -- Patch Links -- https://github.com/probabl-ai/calibration-cost-sensitive-learning/pull/7.patch https://github.com/probabl-ai/calibration-cost-sensitive-learning/pull/7.diff -- Reply to this email directly or view it on GitHub: #7 You are receiving this because you are subscribed to this thread. Message ID: ***@***.***>

glemaitre · 2024-08-14T19:52:02Z

Thanks @GaelVaroquaux

glemaitre · 2024-08-15T09:54:48Z

Distribution shift is also a frequent cause of miscalibration

@GaelVaroquaux One tiny question to get clearer insights: when it comes distribution shift, could we think of it similarly to the resampling effect meaning. As a concrete example, during a medical trial we learn on a controlled distribution where the proportion on the target is controlled in some way but when we are at predict time, then the class proportion is really different meaning that we are not calibrated anymore.

Is it this effect or you can have additional effect on the data X itself and have a feature distribution shift as well. In this case, I would expect that the model learnt is not valid anymore.

GaelVaroquaux · 2024-08-15T10:09:55Z

Actual covariate shift. Taking the stance that the model is not valid anymore is fine from a theoretical point of view, but often you don't have enough data to learn a new model. With Alex Perez-Lebel, we found that often recalibrating gives you a significant fraction of the boost you could get with a new model (under review)

…

On Aug 15, 2024, 11:00, at 11:00, Guillaume Lemaitre ***@***.***> wrote: > Distribution shift is also a frequent cause of miscalibration @GaelVaroquaux One tiny question to get clearer insights: when it comes distribution shift, could we think of it similarly to the resampling effect meaning. As a concrete example, during a medical trial we learn on a controlled distribution where the proportion on the target is controlled in some way but when we are at predict time, then the class proportion is really different meaning that we are not calibrated anymore. Is it this effect or you can have additional effect on the data `X` itself and have a feature distribution shift as well. In this case, I would expect that the model learnt is not valid anymore. -- Reply to this email directly or view it on GitHub: #7 (comment) You are receiving this because you were mentioned. Message ID: ***@***.***>

glemaitre · 2024-08-15T10:19:52Z

Thanks this is clear.

content/python_files/causes_miscalibration.py

glemaitre · 2024-08-22T15:29:38Z

Merging for the moment. We will make some PRs to improve the notebooks.

add notebook on the cause of miscalibration

821d6ca

glemaitre added 2 commits August 15, 2024 11:49

Merge remote-tracking branch 'origin/main' into example_miscalibration

bed188d

test

cc7bc4a

iter

e3d5edb

show effect of resampling

9e816d3

ogrisel reviewed Aug 16, 2024

View reviewed changes

content/python_files/causes_miscalibration.py Outdated Show resolved Hide resolved

content/python_files/causes_miscalibration.py Outdated Show resolved Hide resolved

glemaitre marked this pull request as draft August 19, 2024 08:05

glemaitre added 12 commits August 19, 2024 11:40

iter

a832217

iter

b921b25

iter

ec1faf8

iter

fcca66f

iter

b66d04e

iter

6f365bc

iter

ec19780

iter

a2dd591

iter

b2f04b8

random search

0f393f3

stop using future tense

d4df538

add first discussion about resampling

2785343

glemaitre marked this pull request as ready for review August 21, 2024 12:58

glemaitre added 5 commits August 21, 2024 15:49

Merge remote-tracking branch 'origin/main' into example_miscalibration

3f4b554

iter

b79c9b4

iter

64761ef

split

277f304

iter

68920c5

glemaitre added 8 commits August 22, 2024 14:17

iter

89ce43e

iter

4f4e553

iter

2e4f9c3

add results validation curve

3bc76a5

avoid tracking results folder

a2ccf24

iter

ccc61b3

book

207a47d

no need to execute twice

116d3ef

glemaitre merged commit 82a7993 into probabl-ai:main Aug 22, 2024
3 checks passed

github-actions bot pushed a commit that referenced this pull request Aug 22, 2024

[ci skip] add notebook on the cause of miscalibration (#7) 82a7993

4c7cb44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add notebook on the cause of miscalibration #7

add notebook on the cause of miscalibration #7

glemaitre commented Aug 14, 2024 •

edited

Loading

GaelVaroquaux commented Aug 14, 2024 via email

glemaitre commented Aug 14, 2024

glemaitre commented Aug 15, 2024

GaelVaroquaux commented Aug 15, 2024 via email

glemaitre commented Aug 15, 2024

glemaitre commented Aug 22, 2024

add notebook on the cause of miscalibration #7

add notebook on the cause of miscalibration #7

Conversation

glemaitre commented Aug 14, 2024 • edited Loading

GaelVaroquaux commented Aug 14, 2024 via email

glemaitre commented Aug 14, 2024

glemaitre commented Aug 15, 2024

GaelVaroquaux commented Aug 15, 2024 via email

glemaitre commented Aug 15, 2024

glemaitre commented Aug 22, 2024

glemaitre commented Aug 14, 2024 •

edited

Loading