-
Notifications
You must be signed in to change notification settings - Fork 515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Improve wording in stratification notebook #760
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM.
# | ||
# In conclusion, it is a good practice to use stratification within the | ||
# cross-validation framework when dealing with a classification problem, | ||
# especially for datasets with imbalanced classes or when the class distribution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not really sure of the conclusion. To me, I'm thinking more about the following two aspects:
- if target labels are ordered or grouped, then stratification allows to overcome this issue if you forget to shuffle as in the k-fold case;
- if the sample size is limited or small (and data are shuffled), then taking a stratified fold ensure similar train/test distribution compare to the uniform sampling.
And thus overcoming the both above issues make that the evaluation is closer to reality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not entirely convinced on the rewording I made, but I think I addressed your points in 42c9775
Co-authored-by: Guillaume Lemaitre <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Co-authored-by: ArturoAmorQ <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> ca7d1d7
During a formation at Inria Academy we noticed that this notebook never really justifies why stratification is important. This PR adds a couple of paragraphs to better motivate the reason why a simple
KFold
with shuffling is not a good enough practice.It also takes the opportunity to implement verbs in present mode and improve general wording.
NB. I think this PR is safe to merge at it does not change the overall experience of the mooc.