diff --git a/README.md b/README.md index 2cbea12..b456edd 100644 --- a/README.md +++ b/README.md @@ -1,22 +1,24 @@ +[![DOI](https://zenodo.org/badge/624947909.svg)](https://zenodo.org/doi/10.5281/zenodo.13352211) + # NF1 Schwann cell data analysis -We aim to characterize cell morphology signatures of neurofibromin in Schwann cells. +We aim to characterize cell morphology signatures of neurofibromin in Schwann cells. We applied a modified Cell Painting assay on two isogenic Schwann cell lines, one of the wildtype genotype (*NF1+/+*) and one of the null genotype (*NF1-/-*), both from the same patient. -The modified assay stains for four organelles: nuclei, endoplasmic reticulum, mitochondria, and actin. -We applied CellProfiler pipelines to perform quality control, illumination correction, segmentation, and feature extraction. +The modified assay stains for four organelles: nuclei, endoplasmic reticulum, mitochondria, and actin. +We applied CellProfiler pipelines to perform quality control, illumination correction, segmentation, and feature extraction. ![Figure 1](./3.figures/figures/main_figure_1_workflow.png) > *Image montage of dataset and analysis workflow.* (A) Example image montages of the Cell Painting channels and composite image (all channels overlayed) for each NF1 genotype. The scale bar represents 25 μM. (B) The workflow of our analysis pipeline demonstrates the steps taken from image analysis to machine learning -We segmented 22,585 wild-type (WT) and null cells across three plates and utilized 907 significant morphology features representing various organelle shapes and intensity patterns. +We segmented 22,585 wild-type (WT) and null cells across three plates and utilized 907 significant morphology features representing various organelle shapes and intensity patterns. -We trained a logistic regression binary classifier to predict the NF1 genotype of single cells. +We trained a logistic regression binary classifier to predict the NF1 genotype of single cells. The model shows high performance with with accuracy of 0.85 and 0.80 for the training and testing data splits respectively. ![Figure 3](./3.figures/figures/main_figure_3_model_eval.png) > *Logistic regression model predicts genotype with high performance.* (A) Precision-recall curves comparing the final model applied to shuffled (dashed line) and non-shuffled data (solid line). Applying the model to a shuffled dataset performed worse than the non-shuffled data, demonstrating a biological signal between genotypes. (B) Confusion matrices from the training and testing data splits show higher performance across genotypes in non-shuffled data compared to the shuffled data. (C) Accuracy scores show high performance classifying cells with both genotypes from the training and testing data splits compared to shuffled data. Both panels B and C visualize the results from the optimized model. -We look to improve upon this preliminary model in the future. +We look to improve upon this preliminary model in the future. We aim to generate further data which includes the heterozygous genotype (*NF1+/-*). AS well, we plan to apply an improved model to large-scale drug screens to capture candidate drugs that make NF1 patient cells look healthy.