Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added histograms and boxplots for outlier variables #35

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

m-visintini
Copy link
Member

@m-visintini m-visintini commented Oct 22, 2024

Please test it and let me know the outcome!
One thing to bear in mind is that the rendering of graphs without outliers only works if the correct ID variable is set up in the Setup tab. This is a bit prone to error at the moment. As soon as I will be done with issue #24 this should not be a problem anymore, since the id variable will be enforced globally.

@m-visintini m-visintini linked an issue Oct 22, 2024 that may be closed by this pull request
Copy link
Member

@mariarrt94 mariarrt94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this feature, especially the boxplots. They are super useful for identifying group outliers, and I think they add a lot of value.

Observations

  • For example, when looking at the income variable grouped, I noticed an outlier in inc_1.
  • However, when comparing the histogram with and without outliers, both look the same. The axis in the "without outliers" view still shows values up to 10,000,000, which is the identified outlier.
    Screenshot 2024-10-24 100939

Suggestions

  • I am unsure whether we should remove these outliers altogether.
  • Winsorization might be a great alternative. It could be helpful to show:
    1. The histogram of inc_01.
    2. The histogram with winsorized inc_01 to see the impact on the distribution.
    3. instead of the filtered_hfc we would create a dataset with the winsorized variable maybe at 95%, let me know if you want to discuss this.

This is a great addition to the tool—thanks for all the work on it!

@m-visintini
Copy link
Member Author

good idea! Will fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add graph to visualize Outliers checks
2 participants