You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
11:18 AM
Ah I see,I can take the data split from the split function and then return a dictionary of train Val test adata
11:18
Do you think it makes sense to set this as a default for the benchmark? Since I believe most method developer are using adata for model training
11:18 AM
I see
11:19
Ok. Well, let’s make a flag for use_anndata and set it to True by default?
11:19 AM
Sounds good
11:19
I will do that
11:19 AM
I’d rather not get rid of the pandas code
11:19
cool!
11:20 AM
Sounds good
11:20 AM
Sorry for these discrepancies, the lab has been moving away from anndata, so I forget we still currently have some dependencies on it
11:24 AM
I see - no worries but I do want to point out that for most single cell analysis/ML models people still use adata. because there are indeed lots of cell observations (e.g. perturbation) metadata and gene meta data that need to stored. For the ease of use, I feel like we can still prepare an adata flag if people need them!
11:25 AM
absolutely. i’ll add an action item to better expose the getters for anndata
The text was updated successfully, but these errors were encountered:
Describe the problem
Though self.adata exists, there is no obvious getter method. also, the splits don't provide an anndata option
Describe the solution you'd like
getter method(s); also implement splits for anndata as well
Additional context
from slack
Oh Is there a function to load that already? Because I checked when we download the raw file it is in the adata format
11:12 AM
yes
11:12
https://github.com/mims-harvard/TDC/blob/main/tdc/multi_pred/[anndata_dataset.py](https://github.com/mims-harvard/TDC/blob/main/tdc/multi_pred/anndata_dataset.py)#L10
anndata_dataset.py
self.adata = self.df # this is in AnnData format
https://github.com/[mims-harvard/TDC](https://github.com/mims-harvard/TDC)|mims-harvard/TDCmims-harvard/TDC | Added by GitHub
11:12
self.adata will contain the anndata dataframe (edited)
11:12
apologies, i should expose that better via a getter function or something
11:14
The existing loader for perturboutcome inherist from the anndata loader
11:14
https://github.com/mims-harvard/TDC/blob/main/tdc/multi_pred/single_cell.py#L11
single_cell.py
class CellXGeneTemplate(DataLoader):
https://github.com/mims-harvard/TDC|mims-harvard/TDCmims-harvard/TDC | Added by GitHub
11:14
https://github.com/mims-harvard/TDC/blob/main/tdc/multi_pred/perturboutcome.py#L16
perturboutcome.py
class PerturbOutcome(CellXGeneTemplate):
https://github.com/[mims-harvard/TDC](https://github.com/mims-harvard/TDC)|mims-harvard/TDCmims-harvard/TDC | Added by GitHub
11:15
so self.adata will be anndata 🙂
11:17
though i suppose for the benchmark, the splits are not implemented for anndata
11:18 AM
Ah I see,I can take the data split from the split function and then return a dictionary of train Val test adata
11:18
Do you think it makes sense to set this as a default for the benchmark? Since I believe most method developer are using adata for model training
11:18 AM
I see
11:19
Ok. Well, let’s make a flag for use_anndata and set it to True by default?
11:19 AM
Sounds good
11:19
I will do that
11:19 AM
I’d rather not get rid of the pandas code
11:19
cool!
11:20 AM
Sounds good
11:20 AM
Sorry for these discrepancies, the lab has been moving away from anndata, so I forget we still currently have some dependencies on it
11:24 AM
I see - no worries but I do want to point out that for most single cell analysis/ML models people still use adata. because there are indeed lots of cell observations (e.g. perturbation) metadata and gene meta data that need to stored. For the ease of use, I feel like we can still prepare an adata flag if people need them!
11:25 AM
absolutely. i’ll add an action item to better expose the getters for anndata
The text was updated successfully, but these errors were encountered: