Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create connectivities matrix #46

Closed
shntnu opened this issue May 24, 2020 · 3 comments
Closed

Create connectivities matrix #46

shntnu opened this issue May 24, 2020 · 3 comments
Assignees
Labels
wontfix This will not be worked on

Comments

@shntnu
Copy link
Collaborator

shntnu commented May 24, 2020

Create 1571x1571 matrix of connectivities between compounds. Details tbd.

@shntnu
Copy link
Collaborator Author

shntnu commented May 28, 2020

@shntnu shntnu self-assigned this May 28, 2020
@gwaybio
Copy link
Member

gwaybio commented May 28, 2020

Noting pycytominer.write_gct() here in case it is at all helpful

@shntnu shntnu added the wontfix This will not be worked on label May 28, 2020
@shntnu
Copy link
Collaborator Author

shntnu commented May 28, 2020

We decided to drop the View Connectivities link in clue.io/morphology given the overhead required for creating that file. We will instead make the consensus/2016_04_01_a549_48hr_batch1/2016_04_01_a549_48hr_batch1_consensus_modz.csv.gz available as a GCT file that can be directly loaded in morpheus.


From: Shantanu Singh
Date: Sat, May 23, 2020 at 7:03 PM
To: Jacob Asiedu
Cc: Ted Natoli, IPLINCS, Gregory Way

I realized we also need to update the link for the "View Connectivities" button on https://clue.io/morphology

Currently, it points to s3://data.clue.io/cell-painting/introspect_aggregate_maxq_n1571x1571.gct

  1. What would you need from us to update that? Just the URL alone?2. Looks like Morpheus now accepts tab-delimited text files https://clue.io/morpheus. Does that mean we can generate a TSV for the connectivities?
    -Shantanu

query string, for our future reference:

{
   "dataset": "//s3.amazonaws.com/data.clue.io/cell-painting/introspect_aggregate_maxq_n1571x1571.gct",
   "columns": [
      {
         "field": "name",
         "display": [
            "text"
         ]
      }
   ],
   "rows": [
      {
         "field": "pert_iname",
         "display": [
            "text"
         ]
      },
      {
         "field": "moa",
         "display": [
            "text"
         ]
      }
   ],
   "rowSortBy": [
      {
         "field": "moa",
         "order": 0,
         "type": "annotation"
      },
      {
         "field": "pert_iname",
         "order": 0,
         "type": "annotation"
      }
   ],
   "columnSortBy": [
      {
         "field": "moa",
         "order": 0,
         "type": "annotation"
      },
      {
         "field": "name",
         "order": 0,
         "type": "annotation"
      }
   ]
}

From: Jacob Asiedu
Date: Sat, May 23, 2020 at 7:11 PM
To: Shantanu Singh
Cc: Ted Natoli, IPLINCS, Gregory Way

We would prefer a gct file. So I suggest we generate a new file and mark it as latest on the page and deprecate the old one. We could make the old one still available as reference.
See https://clue.io/proteomics for an example of what I mean


From: Shantanu Singh
Date: Thu, May 28, 2020 at 8:59 AM
To: Jacob Asiedu
Cc: Ted Natoli, IPLINCS, Gregory Way

Jacob – Will do. Going forward, we will version each release of the data (corresponding to any future updates we make to the data processing) via git tags + GitHub releases. 
Ted – I was trying to look up code that we used to generate introspect_aggregate_maxq_n1571x1571.gct
I think it is this: https://github.com/broadinstitute/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/blob/090787c46b44c6fbb7960915b4ab77b9c31c2344/analysis_log.sh#L227-L239

Does that sound right to you? I have forgotten the cmapM API :)


From: Shantanu Singh
Date: Thu, May 28, 2020 at 11:06 AM
To: Jacob Asiedu
Cc: Ted Natoli, IPLINCS, Gregory Way

Thinking about this again: 
Could we deprecate the "View Connectivities" link and instead provide a link to load the Level 5 GCT file for the whole experiment? The user can always compute the similarities in Morpheus (and aggregate) as needed.
This will save us a TON of effort because we'd need to run several steps of this code each time we update our data processing pipeline, and the only way to do that is set up things on the Broad cluster to run cmapM, something we are not intimately familiar with.
Let me know if that works for you.
Shantanu


From: Ted Natoli
Date: Thu, May 28, 2020 at 2:14 PM
To: Shantanu Singh
Cc: Jacob Asiedu, IPLINCS, Gregory Way

Hi Shantanu,
Yes that code for generating the maxq aggregated introspect matrix looks right to me. I also think it would be fine to deprecate the View Connectivities link and instead provide a link to load the entire level 5 matrix. The only minor caveat is that morpheus does not have the ability to compute percentile scores (aka tau values), so it will not be possible to reproduce the values that are currently in introspect_aggregate_maxq_n1571x1571.gct directly within morpheus. But this may be a worthwhile tradeoff to avoid having to recompute and aggregate connectivities whenever you reprocess the data.
Best,Ted


From: Shantanu Singh
Date: Sat, May 30, 2020 at 1:09 AM
To: Ted Natoli
Cc: Jacob Asiedu, IPLINCS, Gregory Way

Hi Ted and Jacob

Thanks for accommodating that.
We are figuring out some other details related to the consensus profiles that might take a while to sort out. For now, is it possible to go ahead and deprecate the connectivities link, and only have the Download data option available (mockup below)? In the next version, we will make more data available that can be easily explored and you can bring back the Explore section.
If that works for you, then I think you have everything you need (i.e. the Manifest file) to make the data available via clue.io/morphology. Please LMK if that's not the case.
BestShantanu


From: Ted Natoli
Date: Sat, May 30, 2020 at 8:41 AM
To: Shantanu Singh
Cc: Jacob Asiedu, IPLINCS, Gregory Way

Hi Shantanu,
That sounds fine to me. Jacob what do you think?
Best,Ted


From: Jacob Asiedu
Date: Sat, May 30, 2020 at 8:48 AM
To: Ted Natoli
Cc: Shantanu Singh, IPLINCS, Gregory Way

Sounds good to me. I will go ahead and implement the suggestions.


From: Jacob Asiedu
Date: Tue, Jun 2, 2020 at 12:32 PM
To: Shantanu Singh
Cc: IPLINCS, Gregory Way, Ted Natoli

Hello Shantanu,
The downloads should be all set now. In our next release, I will deprecate the connectivity link.Please take a look and let me know what you think.
Thanks


From: Shantanu Singh
Date: Tue, Jun 2, 2020 at 1:25 PM
To: Jacob Asiedu
Cc: IPLINCS, Gregory Way, Ted Natoli

Hi Jacob,
Thanks for getting this done so quickly! This looks great for now. Once Greg and I have chatted about this, we will make some suggestions which you could consider for your next release
BestShantanu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants