SigPrimedNet: a Signaling-informed Neural Network for scRNA-seq Annotation of Known and Unknown Cell Types
Single-cell RNA sequencing is increasing our understanding of the behavior of complex tissues or organs, by providing unprecedented details on the complex cell type landscape at the level of individual cells. Cell type definition and functional annotation are key steps to understanding the molecular processes behind the underlying cellular communication machinery. However, the exponential growth of scRNA-seq data has made the task of manually annotating cells unfeasible, due not only to an unparalleled resolution of the technology but to an ever-increasing heterogeneity of the data. Many supervised and unsupervised methods have been proposed to automatically annotate cells. Supervised approaches for cell-type annotation outperform unsupervised methods except when new (unknown) cell types are present. Here, we introduce SigPrimedNet an artificial neural network approach that leverages i) efficient training by means of a sparsity-inducing signaling circuits-informed layer, ii) feature representation learning through supervised training, and iii) unknown cell-type identification by fitting an anomaly detection method on the learned representation. We show that SigPrimedNet can efficiently annotate known cell types while keeping a low false-positive rate for unseen cells across a set of publicly available datasets. In addition, the learned representation acts as a proxy for signaling circuit activity measurements, which provide useful estimations of the cell functionalities.
-
Experiments and exporting the prior biological knowledge information steps are shared in 'project_bash.sh' file.
-
Figures, shared in paper and supplementary files, can be created by using notebooks in 'SigPrimedNet_figures' folder.
-
Experiments for revision using full SigPrimedNet are tagged with "REVIEW".
├── environment.yml <- The environment file │ ├── README.md <- The top-level README for developers using this project. │ ├── project_bash.sh <- All experiments and exporting the prior biological knowledge information steps │ ├── data │ ├── external <- Data from third party sources. │ ├── processed <- The final, canonical data sets for modeling. │ └── raw <- The original, immutable data dump. │ ├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering), │ the creator's initials, and a short
-
delimited description, │ e.g.1.0-jqp-initial-data-exploration
│ ├── references <- Data dictionaries, manuals, and all other explanatory materials │ ├── SigPrimedNet_figures <- Jupyter notebooks, which create all figure in paper and supplementary file. │ └── scripts <- helper scripts
Project based on the cookiecutter data science project template. #cookiecutterdatascience
This work is supported by grants PID2020-117979RB-I00 and PID2020-117954RB-C22 from the Spanish Ministry of Science and Innovation, IMP/0019 from the Instituto de Salud Carlos III (ISCIII), co-funded with European Regional Development Funds (ERDF); grant H2020 Programme of the European Union grants Marie Curie Innovative Training Network “Machine Learning Frontiers in Precision Medicine” (MLFPM) (GA 813533). The authors also acknowledge Junta de Andalucía for the postdoctoral contract of Carlos Loucera (PAIDI2020- DOC_00350) co-funded by the European Social Fund (FSE) 2014-2020.