TrojansQCB -- Vamsi Chris Jared #12

cjsimons5 · 2022-11-01T04:08:21Z

No description provided.

ramayyala

All comments are in red in the notebook but here is the list of the comments below:
1.) Remove all commented code out of the notebook to improve readability. The code commented out in the previous line should be removed from notebook
2.) Changed removing of outliers to remove all data that has Audit Errors as most of these errors are pretty relevant to the data in terms of looking at expression. Maybe not remove all of them, but y'all included way too many of the errors, which caused too many of your graphs to have too many labels which decreases readability. Moreover, it probably hid some trends that were in the data but just masked by outliers.
3a.)Make sure all the axes are labelled in the graphs as many were missing labels.
3b.) Maybe use the human readable position formatter on your expression plots so the x-axis is more readable.
4.) If you have a lot of labels on your legend, rather than let them just make a huge plot where the labels just run off the figure, it is better to use the ncol parameter in the plt.legend() call to increase the number of columns for your labels so it doesn't run off the page. We modified this in your Histone and TF Chip seq Expression vs Position Graphs.
5.) There is no point in plotting the TF Chip and Histone expressions seperately just to color each line by each sample accession. More useful to combine the plots into one plot, and color by whether it is Histone or TF ChipSeq to see which assay has different patterns.
6.) For your Experiment Targets/Cell Type PCA Plot, there are way too many experiments in this graph to be readable. I would suggest filtering the data a bit more to remove some of the outliers and potentially grouping some of the experiments into a single experiment if they are on similar targets like Histones or TF's to simplify it.
7.) It would be more useful if you did a PCA plot without the standard scaler, and a PCA with the standard scalar to see how preprocessing changes the clustering on the PCA. The plots above really just show the difference between including and excluding the chr10 region with centromere which modifies the clustering a little bit but still isn't the most useful to demonstrate the point of preprocessing using Standard Scalar Fit.

8.) Simplified the hwutil function to take out an extra variable name
@gfudenberg

vbchavali and others added 8 commits October 17, 2022 12:58

Vamsi through initial questions

63c57c4

Vamsi through initial questions

4155069

Chris Update

23c6dd4

plots by column

143f0d1

Friday Group Work

6371572

Almost Done

d680de0

MDS question update

a1a3d9e

Final Update

4d16741

vbchavali changed the base branch from main to trojansqcb November 1, 2022 22:46

vbchavali requested review from gfudenberg, vbchavali, ashleyjk, jared-khan and ramayyala November 1, 2022 22:46

ashleyjk added 2 commits November 15, 2022 15:35

partial review

ec5d320

Review done and hwutils done

3520f2f

ramayyala reviewed Nov 16, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TrojansQCB -- Vamsi Chris Jared #12

TrojansQCB -- Vamsi Chris Jared #12

cjsimons5 commented Nov 1, 2022

ramayyala left a comment

TrojansQCB -- Vamsi Chris Jared #12

Are you sure you want to change the base?

TrojansQCB -- Vamsi Chris Jared #12

Conversation

cjsimons5 commented Nov 1, 2022

ramayyala left a comment

Choose a reason for hiding this comment