-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call out disconnected components in network visualizations #1
Comments
Sorry didn't see this notification until now. Turns out I wasn't getting notifications... switched to "Watch" mode. Let me look into the number of connected components in STRING. I think a heatmap of any of the following would help distinct protein clusters: affinity / common neighbors / random walk distance / shortest path. I've been playing around with plotly. Would be really cool if we could get the heatmap and scatterplot to sync up, such that selecting on the scatterplot would create a synced heatmap. |
Ah yea that would be very cool. If you go the dash route, this could be a helpful resource on linking scatter plot clicks or lasso/range selections to events in other figures. It would also be cool if there was a way to link groups of scatter points so that clicking on one of them does something corresponding to the whole group, like highlight all other points in the same connected component. I think Plot.ly could do that if you made sure all the points for one connected component are in the same trace, but I'm not sure if they provide the trace name on click (I think you can assign a point id though and do a reverse-lookup). |
I looked into connected components in I plotted the cumulative coverage of components ranked by size (interactive version in notebook): For the combined score, the largest component contains 98.9% of all genes. This obviously could change if we applied a score threshold of 500. Although in general, it's probably best if we use edge weights rather than binarization as much as possible.
Yeah, I exported the component assignments in e3998c5, so we can always add this to the embedding. Maybe it'd be helpful to differentiate all genes not in the giant connected component. Actually upon further investigation, all genes for combined_score that are not in the giant connected component have a component size of 1, i.e. are entirely disconnected. I think these genes actually drop out during the node2vec embedding stage, so they aren't in the visualization. |
Wow that Explorer UI is really cool. Will check with you before proceeding with any dashboarding solution using
Yeah! Definitely. I'll look into migrating the Bokeh scatterplot to Plotly, which seems a bit more powerful, intuitive, and compatible with voila / notebooks. |
I think it would be good to have that for a few combined score thresholds (Jack was simply ignoring all below 900, which I saw in a publication or two as well)
I'm always happy to riff about Dash! But I don't mean to bias you too much towards it. I'd love to know what other solutions can do. I default to that simply b/c I like Plot.ly and assume, probably incorrectly, that other libs don't do useful/interesting things beyond what it can. I definitely agree that the API is more intuitive though. |
I'm fairly certain a good number of these exist in STRING and I think it would be a very helpful annotation on embedding visualizations (i.e. it would be good context to know which clusters are different graphs entirely vs less related groups of proteins in the same graph).
The text was updated successfully, but these errors were encountered: