-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TrojansQCB -- Vamsi Chris Jared #12
base: trojansqcb
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All comments are in red in the notebook but here is the list of the comments below:
1.) Remove all commented code out of the notebook to improve readability. The code commented out in the previous line should be removed from notebook
2.) Changed removing of outliers to remove all data that has Audit Errors as most of these errors are pretty relevant to the data in terms of looking at expression. Maybe not remove all of them, but y'all included way too many of the errors, which caused too many of your graphs to have too many labels which decreases readability. Moreover, it probably hid some trends that were in the data but just masked by outliers.
3a.)Make sure all the axes are labelled in the graphs as many were missing labels.
3b.) Maybe use the human readable position formatter on your expression plots so the x-axis is more readable.
4.) If you have a lot of labels on your legend, rather than let them just make a huge plot where the labels just run off the figure, it is better to use the ncol parameter in the plt.legend() call to increase the number of columns for your labels so it doesn't run off the page. We modified this in your Histone and TF Chip seq Expression vs Position Graphs.
5.) There is no point in plotting the TF Chip and Histone expressions seperately just to color each line by each sample accession. More useful to combine the plots into one plot, and color by whether it is Histone or TF ChipSeq to see which assay has different patterns.
6.) For your Experiment Targets/Cell Type PCA Plot, there are way too many experiments in this graph to be readable. I would suggest filtering the data a bit more to remove some of the outliers and potentially grouping some of the experiments into a single experiment if they are on similar targets like Histones or TF's to simplify it.
7.) It would be more useful if you did a PCA plot without the standard scaler, and a PCA with the standard scalar to see how preprocessing changes the clustering on the PCA. The plots above really just show the difference between including and excluding the chr10 region with centromere which modifies the clustering a little bit but still isn't the most useful to demonstrate the point of preprocessing using Standard Scalar Fit.
8.) Simplified the hwutil function to take out an extra variable name
@gfudenberg
No description provided.