diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 00000000..7f5067ed --- /dev/null +++ b/docs/README.md @@ -0,0 +1,204 @@ + + +# Data processing for image-based profiling + +[![Build Status](https://github.com/cytomining/pycytominer/actions/workflows/integration-test.yml/badge.svg?branch=main)](https://github.com/cytomining/pycytominer/actions/workflows/integration-test.yml?query=branch%3Amain) +[![Coverage Status](https://codecov.io/gh/cytomining/pycytominer/branch/main/graph/badge.svg)](https://codecov.io/github/cytomining/pycytominer?branch=main) +[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff) +[![RTD](https://readthedocs.org/projects/pycytominer/badge/?version=latest&style=flat)](https://pycytominer.readthedocs.io/) +[![DOI](https://img.shields.io/badge/DOI-10.48550/arXiv.2311.13417-blue)](https://doi.org/10.48550/arXiv.2311.13417) + +Pycytominer is a suite of common functions used to process high dimensional readouts from high-throughput cell experiments. +The tool is most often used for processing data through the following pipeline: + +Description of the pycytominer pipeline. Images flow from feature extraction and are processed with a series of steps + +[Click here for high resolution pipeline image](https://github.com/cytomining/pycytominer/blob/main/media/pipeline.png) + +Image data flow from a microscope to cell segmentation and feature extraction tools (e.g. CellProfiler or DeepProfiler). +From here, additional single cell processing tools curate the single cell readouts into a form manageable for pycytominer input. +For CellProfiler, we use [cytominer-database](https://github.com/cytomining/cytominer-database) or [CytoTable](https://github.com/cytomining/CytoTable). +For DeepProfiler, we include single cell processing tools in [pycytominer.cyto_utils](cyto_utils.md). + +From the single cell output, pycytominer performs five steps using a simple API (described below), before passing along data to [cytominer-eval](https://github.com/cytomining/cytominer-eval) for quality and perturbation strength evaluation. + +## Installation + +You can install pycytominer via pip: + +```bash +pip install pycytominer +``` + +or conda: + +```bash +conda install -c conda-forge pycytominer +``` + +## Frameworks + +Pycytominer is primarily built on top of [pandas](https://pandas.pydata.org/docs/index.html), also using aspects of SQLAlchemy, sklearn, and pyarrow. + +Pycytominer currently supports [parquet](https://parquet.apache.org/) and compressed text file (e.g. `.csv.gz`) i/o. + +## API + +Pycytominer has five major processing functions: + +1. Aggregate - Average single-cell profiles based on metadata information (most often "well"). +2. Annotate - Append metadata (most often from the platemap file) to the feature profile +3. Normalize - Transform input feature data into consistent distributions +4. Feature select - Exclude non-informative or redundant features +5. Consensus - Average aggregated profiles by replicates to form a "consensus signature" + +The API is consistent for each of these functions: + +```python +# Each function takes as input a pandas DataFrame or file path +# and transforms the input data based on the provided options and methods +df = function( + profiles_or_path, + features, + samples, + method, + output_file, + additional_options... +) +``` + +Each processing function has unique arguments, see our [documentation](https://pycytominer.readthedocs.io/) for more details. + +## Usage + +The default way to use pycytominer is within python scripts, and using pycytominer is simple and fun. + +```python +# Real world example +import pandas as pd +import pycytominer + +commit = "da8ae6a3bc103346095d61b4ee02f08fc85a5d98" +url = f"https://media.githubusercontent.com/media/broadinstitute/lincs-cell-painting/{commit}/profiles/2016_04_01_a549_48hr_batch1/SQ00014812/SQ00014812_augmented.csv.gz" + +df = pd.read_csv(url) + +normalized_df = pycytominer.normalize( + profiles=df, + method="standardize", + samples="Metadata_broad_sample == 'DMSO'" +) +``` + +### Pipeline orchestration + +Pycytominer is a collection of different functions with no explicit link between steps. +However, some options exist to use pycytominer within a pipeline framework. + +| Project | Format | Environment | pycytominer usage | +| :------------------------------------------------------------------------------- | :-------- | :------------------- | :---------------------- | +| [Profiling-recipe](https://github.com/cytomining/profiling-recipe) | yaml | agnostic | full pipeline support | +| [CellProfiler-on-Terra](https://github.com/broadinstitute/cellprofiler-on-Terra) | WDL | google cloud / Terra | single-cell aggregation | +| [CytoSnake](https://github.com/WayScience/CytoSnake) | snakemake | agnostic | full pipeline support | + +A separate project called [AuSPICES](https://github.com/broadinstitute/AuSPICEs) offers pipeline support up to image feature extraction. + +## Other functionality + +Pycytominer was written with a goal of processing any high-throughput image-based profiling data. +However, the initial use case was developed for processing image-based profiling experiments specifically. +And, more specifically than that, image-based profiling readouts from [CellProfiler](https://github.com/CellProfiler) measurements from [Cell Painting](https://www.nature.com/articles/nprot.2016.105) data. + +Therefore, we have included some custom tools in `pycytominer/cyto_utils` that provides other functionality: + +Note, [`pycytominer.cyto_utils.cells.SingleCells()`](cyto_utils.md##pycytominer.cyto_utils.cells) contains code to interact with single-cell SQLite files, which are output from CellProfiler. +Processing capabilities for SQLite files depends on SQLite file size and your available computational resources (for ex. memory and cores). + +### CellProfiler CSV collation + +If running your images on a cluster, unless you have a MySQL or similar large database set up then you will likely end up with lots of different folders from the different cluster runs (often one per well or one per site), each one containing an `Image.csv`, `Nuclei.csv`, etc. +In order to look at full plates, therefore, we first need to collate all of these CSVs into a single file (currently SQLite) per plate. +We currently do this with a library called [cytominer-database](https://github.com/cytomining/cytominer-database). + +If you want to perform this data collation inside pycytominer using the `cyto_utils` function `collate` (and/or you want to be able to run the tests and have them all pass!), you will need `cytominer-database==0.3.4`; this will change your installation commands slightly: + +```bash +# Example for general case commit: +pip install "pycytominer[collate]" + +# Example for specific commit: +pip install "pycytominer[collate] @ git+https://github.com/cytomining/pycytominer@77d93a3a551a438799a97ba57d49b19de0a293ab" +``` + +If using `pycytominer` in a conda environment, in order to run `collate.py`, you will also want to make sure to add `cytominer-database=0.3.4` to your list of dependencies. + +### Creating a cell locations lookup table + +The `CellLocation` class offers a convenient way to augment a [LoadData](https://cellprofiler-manual.s3.amazonaws.com/CPmanual/LoadData.html) file with X,Y locations of cells in each image. +The locations information is obtained from a single cell SQLite file. + +To use this functionality, you will need to modify your installation command, similar to above: + +```bash +# Example for general case commit: +pip install "pycytominer[cell_locations]" +``` + +Example using this functionality: + +```bash +metadata_input="s3://cellpainting-gallery/test-cpg0016-jump/source_4/workspace/load_data_csv/2021_08_23_Batch12/BR00126114/test_BR00126114_load_data_with_illum.parquet" +single_single_cell_input="s3://cellpainting-gallery/test-cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126114/test_BR00126114.sqlite" +augmented_metadata_output="~/Desktop/load_data_with_illum_and_cell_location_subset.parquet" + +python \ + -m pycytominer.cyto_utils.cell_locations_cmd \ + --metadata_input ${metadata_input} \ + --single_cell_input ${single_single_cell_input} \ + --augmented_metadata_output ${augmented_metadata_output} \ + add_cell_location + +# Check the output + +python -c "import pandas as pd; print(pd.read_parquet('${augmented_metadata_output}').head())" + +# It should look something like this (depends on the width of your terminal): + +# Metadata_Plate Metadata_Well Metadata_Site ... PathName_OrigRNA ImageNumber CellCenters +# 0 BR00126114 A01 1 ... s3://cellpainting-gallery/cpg0016-jump/source_... 1 [{'Nuclei_Location_Center_X': 943.512129380054... +# 1 BR00126114 A01 2 ... s3://cellpainting-gallery/cpg0016-jump/source_... 2 [{'Nuclei_Location_Center_X': 29.9516027655562... +``` + +### Generating a GCT file for morpheus + +The software [morpheus](https://software.broadinstitute.org/morpheus/) enables profile visualization in the form of interactive heatmaps. +Pycytominer can convert profiles into a `.gct` file for drag-and-drop input into morpheus. + +```python +# Real world example +import pandas as pd +import pycytominer + +commit = "da8ae6a3bc103346095d61b4ee02f08fc85a5d98" +plate = "SQ00014812" +url = f"https://media.githubusercontent.com/media/broadinstitute/lincs-cell-painting/{commit}/profiles/2016_04_01_a549_48hr_batch1/{plate}/{plate}_normalized_feature_select.csv.gz" + +df = pd.read_csv(url) +output_file = f"{plate}.gct" + +pycytominer.cyto_utils.write_gct( + profiles=df, + output_file=output_file +) +``` + +## Citing pycytominer + +If you have used `pycytominer` in your project, please use the citation below. +You can also find the citation in the 'cite this repository' link at the top right under `about` section. + +APA: + +```text +Serrano, E., Chandrasekaran, N., Bunten, D., Brewer, K., Tomkinson, J., Kern, R., Bornholdt, M., Fleming, S., Pei, R., Arevalo, J., Tsang, H., Rubinetti, V., Tromans-Coia, C., Becker, T., Weisbart, E., Bunne, C., Kalinin, A. A., Senft, R., Taylor, S. J., Jamali, N., Adeboye, A., Abbasi, H. S., Goodman, A., Caicedo, J., Carpenter, A. E., Cimini, B. A., Singh, S., & Way, G. P. Reproducible image-based profiling with Pycytominer. https://doi.org/10.48550/arXiv.2311.13417 +``` diff --git a/docs/index.md b/docs/index.md deleted file mode 100644 index 4129e67e..00000000 --- a/docs/index.md +++ /dev/null @@ -1,3 +0,0 @@ -{% - include-markdown "../README.md" -%} diff --git a/docs/tutorial.md b/docs/tutorial.md index 6947de2f..492e0008 100644 --- a/docs/tutorial.md +++ b/docs/tutorial.md @@ -1,13 +1,13 @@ # Tutorials -`This `\_ tutorial shows how to run a image-based profiling pipeline using pycytominer. Using IPython notebooks, it walks through the following steps: +This [tutorial](https://github.com/cytomining/pipeline-examples#readme) shows how to run a image-based profiling pipeline using pycytominer. Using IPython notebooks, it walks through the following steps: -#. Downloading a dataset of single cell `CellProfiler `_ profiles. -#. Processing the profiles using PyCytominer. This includes the following steps: -#. Data initialization -#. Single cell aggregation to create well-level profiles -#. Addition of experiment metadata to the well-level profiles -#. Profile normalization -#. Feature selection -#. Forming consensus signatures -#. Evaluating the profile quality using `cytominer-eval `_. +- Downloading a dataset of single cell [CellProfiler](https://cellprofiler.org/) profiles. +- Processing the profiles using PyCytominer. This includes the following steps: +- Data initialization +- Single cell aggregation to create well-level profiles +- Addition of experiment metadata to the well-level profiles +- Profile normalization +- Feature selection +- Forming consensus signatures +- Evaluating the profile quality using [cytominer-eval](https://github.com/cytomining/cytominer-eval>). diff --git a/docs/walkthrough.md b/docs/walkthrough.md deleted file mode 100644 index 448c60b8..00000000 --- a/docs/walkthrough.md +++ /dev/null @@ -1,6 +0,0 @@ -# Walkthroughs - -.. toctree:: -:maxdepth: 1 - -walkthroughs/single_cell_usage.ipynb diff --git a/mkdocs.yml b/mkdocs.yml index 60360f8d..0c264885 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -7,16 +7,17 @@ edit_uri: edit/main/docs/ repo_name: cytomining/pycytominer nav: - - Home: index.md + - Home: README.md - Installation: install.md - Main Functions: functions.md - Cyto Utilities: cyto_utils.md - Operations: operations.md - Tutorial: tutorial.md - - Walkthrough: walkthrough.md + - Walkthroughs: + - Single Cell Usage: walkthroughs/single_cell_usage.ipynb plugins: - search - - include-markdown + - mkdocs-jupyter - mkdocstrings: handlers: python: @@ -37,15 +38,15 @@ theme: palette: - media: "(prefers-color-scheme: light)" scheme: default - primary: white - accent: deep orange + primary: deep purple + accent: purple toggle: icon: material/brightness-7 name: Switch to dark mode - media: "(prefers-color-scheme: dark)" scheme: slate - primary: black - accent: deep orange + primary: deep purple + accent: purple toggle: icon: material/brightness-4 name: Switch to light mode diff --git a/poetry.lock b/poetry.lock index 2c11f29f..2048fed6 100644 --- a/poetry.lock +++ b/poetry.lock @@ -692,6 +692,23 @@ files = [ {file = "colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44"}, ] +[[package]] +name = "comm" +version = "0.2.2" +description = "Jupyter Python Comm implementation, for usage in ipykernel, xeus-python etc." +optional = false +python-versions = ">=3.8" +files = [ + {file = "comm-0.2.2-py3-none-any.whl", hash = "sha256:e6fb86cb70ff661ee8c9c14e7d36d6de3b4066f1441be4063df9c5009f0a64d3"}, + {file = "comm-0.2.2.tar.gz", hash = "sha256:3fd7a84065306e07bea1773df6eb8282de51ba82f77c72f9c85716ab11fe980e"}, +] + +[package.dependencies] +traitlets = ">=4" + +[package.extras] +test = ["pytest"] + [[package]] name = "commitizen" version = "3.24.0" @@ -852,6 +869,37 @@ files = [ {file = "dbfread-2.0.7.tar.gz", hash = "sha256:07c8a9af06ffad3f6f03e8fe91ad7d2733e31a26d2b72c4dd4cfbae07ee3b73d"}, ] +[[package]] +name = "debugpy" +version = "1.8.1" +description = "An implementation of the Debug Adapter Protocol for Python" +optional = false +python-versions = ">=3.8" +files = [ + {file = "debugpy-1.8.1-cp310-cp310-macosx_11_0_x86_64.whl", hash = "sha256:3bda0f1e943d386cc7a0e71bfa59f4137909e2ed947fb3946c506e113000f741"}, + {file = "debugpy-1.8.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:dda73bf69ea479c8577a0448f8c707691152e6c4de7f0c4dec5a4bc11dee516e"}, + {file = "debugpy-1.8.1-cp310-cp310-win32.whl", hash = "sha256:3a79c6f62adef994b2dbe9fc2cc9cc3864a23575b6e387339ab739873bea53d0"}, + {file = "debugpy-1.8.1-cp310-cp310-win_amd64.whl", hash = "sha256:7eb7bd2b56ea3bedb009616d9e2f64aab8fc7000d481faec3cd26c98a964bcdd"}, + {file = "debugpy-1.8.1-cp311-cp311-macosx_11_0_universal2.whl", hash = "sha256:016a9fcfc2c6b57f939673c874310d8581d51a0fe0858e7fac4e240c5eb743cb"}, + {file = "debugpy-1.8.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:fd97ed11a4c7f6d042d320ce03d83b20c3fb40da892f994bc041bbc415d7a099"}, + {file = "debugpy-1.8.1-cp311-cp311-win32.whl", hash = "sha256:0de56aba8249c28a300bdb0672a9b94785074eb82eb672db66c8144fff673146"}, + {file = "debugpy-1.8.1-cp311-cp311-win_amd64.whl", hash = "sha256:1a9fe0829c2b854757b4fd0a338d93bc17249a3bf69ecf765c61d4c522bb92a8"}, + {file = "debugpy-1.8.1-cp312-cp312-macosx_11_0_universal2.whl", hash = "sha256:3ebb70ba1a6524d19fa7bb122f44b74170c447d5746a503e36adc244a20ac539"}, + {file = "debugpy-1.8.1-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a2e658a9630f27534e63922ebf655a6ab60c370f4d2fc5c02a5b19baf4410ace"}, + {file = "debugpy-1.8.1-cp312-cp312-win32.whl", hash = "sha256:caad2846e21188797a1f17fc09c31b84c7c3c23baf2516fed5b40b378515bbf0"}, + {file = "debugpy-1.8.1-cp312-cp312-win_amd64.whl", hash = "sha256:edcc9f58ec0fd121a25bc950d4578df47428d72e1a0d66c07403b04eb93bcf98"}, + {file = "debugpy-1.8.1-cp38-cp38-macosx_11_0_x86_64.whl", hash = "sha256:7a3afa222f6fd3d9dfecd52729bc2e12c93e22a7491405a0ecbf9e1d32d45b39"}, + {file = "debugpy-1.8.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d915a18f0597ef685e88bb35e5d7ab968964b7befefe1aaea1eb5b2640b586c7"}, + {file = "debugpy-1.8.1-cp38-cp38-win32.whl", hash = "sha256:92116039b5500633cc8d44ecc187abe2dfa9b90f7a82bbf81d079fcdd506bae9"}, + {file = "debugpy-1.8.1-cp38-cp38-win_amd64.whl", hash = "sha256:e38beb7992b5afd9d5244e96ad5fa9135e94993b0c551ceebf3fe1a5d9beb234"}, + {file = "debugpy-1.8.1-cp39-cp39-macosx_11_0_x86_64.whl", hash = "sha256:bfb20cb57486c8e4793d41996652e5a6a885b4d9175dd369045dad59eaacea42"}, + {file = "debugpy-1.8.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:efd3fdd3f67a7e576dd869c184c5dd71d9aaa36ded271939da352880c012e703"}, + {file = "debugpy-1.8.1-cp39-cp39-win32.whl", hash = "sha256:58911e8521ca0c785ac7a0539f1e77e0ce2df753f786188f382229278b4cdf23"}, + {file = "debugpy-1.8.1-cp39-cp39-win_amd64.whl", hash = "sha256:6df9aa9599eb05ca179fb0b810282255202a66835c6efb1d112d21ecb830ddd3"}, + {file = "debugpy-1.8.1-py2.py3-none-any.whl", hash = "sha256:28acbe2241222b87e255260c76741e1fbf04fdc3b6d094fcf57b6c6f75ce1242"}, + {file = "debugpy-1.8.1.zip", hash = "sha256:f696d6be15be87aef621917585f9bb94b1dc9e8aced570db1b8a6fc14e8f9b42"}, +] + [[package]] name = "decli" version = "0.6.1" @@ -1312,6 +1360,39 @@ files = [ {file = "iniconfig-2.0.0.tar.gz", hash = "sha256:2d91e135bf72d31a410b17c16da610a82cb55f6b0477d1a902134b24a455b8b3"}, ] +[[package]] +name = "ipykernel" +version = "6.29.4" +description = "IPython Kernel for Jupyter" +optional = false +python-versions = ">=3.8" +files = [ + {file = "ipykernel-6.29.4-py3-none-any.whl", hash = "sha256:1181e653d95c6808039c509ef8e67c4126b3b3af7781496c7cbfb5ed938a27da"}, + {file = "ipykernel-6.29.4.tar.gz", hash = "sha256:3d44070060f9475ac2092b760123fadf105d2e2493c24848b6691a7c4f42af5c"}, +] + +[package.dependencies] +appnope = {version = "*", markers = "platform_system == \"Darwin\""} +comm = ">=0.1.1" +debugpy = ">=1.6.5" +ipython = ">=7.23.1" +jupyter-client = ">=6.1.12" +jupyter-core = ">=4.12,<5.0.dev0 || >=5.1.dev0" +matplotlib-inline = ">=0.1" +nest-asyncio = "*" +packaging = "*" +psutil = "*" +pyzmq = ">=24" +tornado = ">=6.1" +traitlets = ">=5.4.0" + +[package.extras] +cov = ["coverage[toml]", "curio", "matplotlib", "pytest-cov", "trio"] +docs = ["myst-parser", "pydata-sphinx-theme", "sphinx", "sphinx-autodoc-typehints", "sphinxcontrib-github-alt", "sphinxcontrib-spelling", "trio"] +pyqt5 = ["pyqt5"] +pyside6 = ["pyside6"] +test = ["flaky", "ipyparallel", "pre-commit", "pytest (>=7.0)", "pytest-asyncio (>=0.23.5)", "pytest-cov", "pytest-timeout"] + [[package]] name = "ipython" version = "8.12.3" @@ -1515,6 +1596,35 @@ files = [ {file = "jupyterlab_pygments-0.3.0.tar.gz", hash = "sha256:721aca4d9029252b11cfa9d185e5b5af4d54772bb8072f9b7036f4170054d35d"}, ] +[[package]] +name = "jupytext" +version = "1.16.1" +description = "Jupyter notebooks as Markdown documents, Julia, Python or R scripts" +optional = false +python-versions = ">=3.8" +files = [ + {file = "jupytext-1.16.1-py3-none-any.whl", hash = "sha256:796ec4f68ada663569e5d38d4ef03738a01284bfe21c943c485bc36433898bd0"}, + {file = "jupytext-1.16.1.tar.gz", hash = "sha256:68c7b68685e870e80e60fda8286fbd6269e9c74dc1df4316df6fe46eabc94c99"}, +] + +[package.dependencies] +markdown-it-py = ">=1.0" +mdit-py-plugins = "*" +nbformat = "*" +packaging = "*" +pyyaml = "*" +toml = "*" + +[package.extras] +dev = ["jupytext[test-cov,test-external]"] +docs = ["myst-parser", "sphinx", "sphinx-copybutton", "sphinx-rtd-theme"] +test = ["pytest", "pytest-randomly", "pytest-xdist"] +test-cov = ["jupytext[test-integration]", "pytest-cov (>=2.6.1)"] +test-external = ["autopep8", "black", "flake8", "gitpython", "isort", "jupyter-fs (<0.4.0)", "jupytext[test-integration]", "pre-commit", "sphinx-gallery (<0.8)"] +test-functional = ["jupytext[test]"] +test-integration = ["ipykernel", "jupyter-server (!=2.11)", "jupytext[test-functional]", "nbconvert"] +test-ui = ["calysto-bash"] + [[package]] name = "leather" version = "0.4.0" @@ -1547,6 +1657,30 @@ importlib-metadata = {version = ">=4.4", markers = "python_version < \"3.10\""} docs = ["mdx-gh-links (>=0.2)", "mkdocs (>=1.5)", "mkdocs-gen-files", "mkdocs-literate-nav", "mkdocs-nature (>=0.6)", "mkdocs-section-index", "mkdocstrings[python]"] testing = ["coverage", "pyyaml"] +[[package]] +name = "markdown-it-py" +version = "3.0.0" +description = "Python port of markdown-it. Markdown parsing, done right!" +optional = false +python-versions = ">=3.8" +files = [ + {file = "markdown-it-py-3.0.0.tar.gz", hash = "sha256:e3f60a94fa066dc52ec76661e37c851cb232d92f9886b15cb560aaada2df8feb"}, + {file = "markdown_it_py-3.0.0-py3-none-any.whl", hash = "sha256:355216845c60bd96232cd8d8c40e8f9765cc86f46880e43a8fd22dc1a1a8cab1"}, +] + +[package.dependencies] +mdurl = ">=0.1,<1.0" + +[package.extras] +benchmarking = ["psutil", "pytest", "pytest-benchmark"] +code-style = ["pre-commit (>=3.0,<4.0)"] +compare = ["commonmark (>=0.9,<1.0)", "markdown (>=3.4,<4.0)", "mistletoe (>=1.0,<2.0)", "mistune (>=2.0,<3.0)", "panflute (>=2.3,<3.0)"] +linkify = ["linkify-it-py (>=1,<3)"] +plugins = ["mdit-py-plugins"] +profiling = ["gprof2dot"] +rtd = ["jupyter_sphinx", "mdit-py-plugins", "myst-parser", "pyyaml", "sphinx", "sphinx-copybutton", "sphinx-design", "sphinx_book_theme"] +testing = ["coverage", "pytest", "pytest-cov", "pytest-regressions"] + [[package]] name = "markupsafe" version = "2.1.5" @@ -1630,6 +1764,36 @@ files = [ [package.dependencies] traitlets = "*" +[[package]] +name = "mdit-py-plugins" +version = "0.4.0" +description = "Collection of plugins for markdown-it-py" +optional = false +python-versions = ">=3.8" +files = [ + {file = "mdit_py_plugins-0.4.0-py3-none-any.whl", hash = "sha256:b51b3bb70691f57f974e257e367107857a93b36f322a9e6d44ca5bf28ec2def9"}, + {file = "mdit_py_plugins-0.4.0.tar.gz", hash = "sha256:d8ab27e9aed6c38aa716819fedfde15ca275715955f8a185a8e1cf90fb1d2c1b"}, +] + +[package.dependencies] +markdown-it-py = ">=1.0.0,<4.0.0" + +[package.extras] +code-style = ["pre-commit"] +rtd = ["myst-parser", "sphinx-book-theme"] +testing = ["coverage", "pytest", "pytest-cov", "pytest-regressions"] + +[[package]] +name = "mdurl" +version = "0.1.2" +description = "Markdown URL utilities" +optional = false +python-versions = ">=3.7" +files = [ + {file = "mdurl-0.1.2-py3-none-any.whl", hash = "sha256:84008a41e51615a49fc9966191ff91509e3c40b939176e643fd50a5c2196b8f8"}, + {file = "mdurl-0.1.2.tar.gz", hash = "sha256:bb413d29f5eea38f31dd4754dd7377d4465116fb207585f97bf925588687c1ba"}, +] + [[package]] name = "mergedeep" version = "1.3.4" @@ -1734,6 +1898,24 @@ wcmatch = ">=8,<9" [package.extras] cache = ["platformdirs"] +[[package]] +name = "mkdocs-jupyter" +version = "0.24.7" +description = "Use Jupyter in mkdocs websites" +optional = false +python-versions = ">=3.8" +files = [ + {file = "mkdocs_jupyter-0.24.7-py3-none-any.whl", hash = "sha256:893d04bea1e007479a46e4e72852cd4d280c4d358ce4a0445250f3f80c639723"}, +] + +[package.dependencies] +ipykernel = ">6.0.0,<7.0.0" +jupytext = ">1.13.8,<2" +mkdocs = ">=1.4.0,<2" +mkdocs-material = ">9.0.0" +nbconvert = ">=7.2.9,<8" +pygments = ">2.12.0" + [[package]] name = "mkdocs-material" version = "9.5.19" @@ -2031,6 +2213,17 @@ nbformat = "*" sphinx = ">=1.8" traitlets = ">=5" +[[package]] +name = "nest-asyncio" +version = "1.6.0" +description = "Patch asyncio to allow nested event loops" +optional = false +python-versions = ">=3.5" +files = [ + {file = "nest_asyncio-1.6.0-py3-none-any.whl", hash = "sha256:87af6efd6b5e897c81050477ef65c62e2b2f35d51703cae01aff2905b1852e1c"}, + {file = "nest_asyncio-1.6.0.tar.gz", hash = "sha256:6f172d5449aca15afd6c646851f4e31e02c598d553a667e38cafa997cfec55fe"}, +] + [[package]] name = "nodeenv" version = "1.8.0" @@ -2168,8 +2361,8 @@ files = [ [package.dependencies] numpy = [ {version = ">=1.20.3", markers = "python_version < \"3.10\""}, - {version = ">=1.21.0", markers = "python_version >= \"3.10\" and python_version < \"3.11\""}, {version = ">=1.23.2", markers = "python_version >= \"3.11\""}, + {version = ">=1.21.0", markers = "python_version >= \"3.10\" and python_version < \"3.11\""}, ] python-dateutil = ">=2.8.2" pytz = ">=2020.1" @@ -2345,6 +2538,34 @@ files = [ [package.dependencies] wcwidth = "*" +[[package]] +name = "psutil" +version = "5.9.8" +description = "Cross-platform lib for process and system monitoring in Python." +optional = false +python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*" +files = [ + {file = "psutil-5.9.8-cp27-cp27m-macosx_10_9_x86_64.whl", hash = "sha256:26bd09967ae00920df88e0352a91cff1a78f8d69b3ecabbfe733610c0af486c8"}, + {file = "psutil-5.9.8-cp27-cp27m-manylinux2010_i686.whl", hash = "sha256:05806de88103b25903dff19bb6692bd2e714ccf9e668d050d144012055cbca73"}, + {file = "psutil-5.9.8-cp27-cp27m-manylinux2010_x86_64.whl", hash = "sha256:611052c4bc70432ec770d5d54f64206aa7203a101ec273a0cd82418c86503bb7"}, + {file = "psutil-5.9.8-cp27-cp27mu-manylinux2010_i686.whl", hash = "sha256:50187900d73c1381ba1454cf40308c2bf6f34268518b3f36a9b663ca87e65e36"}, + {file = "psutil-5.9.8-cp27-cp27mu-manylinux2010_x86_64.whl", hash = "sha256:02615ed8c5ea222323408ceba16c60e99c3f91639b07da6373fb7e6539abc56d"}, + {file = "psutil-5.9.8-cp27-none-win32.whl", hash = "sha256:36f435891adb138ed3c9e58c6af3e2e6ca9ac2f365efe1f9cfef2794e6c93b4e"}, + {file = "psutil-5.9.8-cp27-none-win_amd64.whl", hash = "sha256:bd1184ceb3f87651a67b2708d4c3338e9b10c5df903f2e3776b62303b26cb631"}, + {file = "psutil-5.9.8-cp36-abi3-macosx_10_9_x86_64.whl", hash = "sha256:aee678c8720623dc456fa20659af736241f575d79429a0e5e9cf88ae0605cc81"}, + {file = "psutil-5.9.8-cp36-abi3-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:8cb6403ce6d8e047495a701dc7c5bd788add903f8986d523e3e20b98b733e421"}, + {file = "psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d06016f7f8625a1825ba3732081d77c94589dca78b7a3fc072194851e88461a4"}, + {file = "psutil-5.9.8-cp36-cp36m-win32.whl", hash = "sha256:7d79560ad97af658a0f6adfef8b834b53f64746d45b403f225b85c5c2c140eee"}, + {file = "psutil-5.9.8-cp36-cp36m-win_amd64.whl", hash = "sha256:27cc40c3493bb10de1be4b3f07cae4c010ce715290a5be22b98493509c6299e2"}, + {file = "psutil-5.9.8-cp37-abi3-win32.whl", hash = "sha256:bc56c2a1b0d15aa3eaa5a60c9f3f8e3e565303b465dbf57a1b730e7a2b9844e0"}, + {file = "psutil-5.9.8-cp37-abi3-win_amd64.whl", hash = "sha256:8db4c1b57507eef143a15a6884ca10f7c73876cdf5d51e713151c1236a0e68cf"}, + {file = "psutil-5.9.8-cp38-abi3-macosx_11_0_arm64.whl", hash = "sha256:d16bbddf0693323b8c6123dd804100241da461e41d6e332fb0ba6058f630f8c8"}, + {file = "psutil-5.9.8.tar.gz", hash = "sha256:6be126e3225486dff286a8fb9a06246a5253f4c7c53b475ea5f5ac934e64194c"}, +] + +[package.extras] +test = ["enum34", "ipaddress", "mock", "pywin32", "wmi"] + [[package]] name = "ptyprocess" version = "0.7.0" @@ -3485,6 +3706,17 @@ webencodings = ">=0.4" doc = ["sphinx", "sphinx_rtd_theme"] test = ["pytest", "ruff"] +[[package]] +name = "toml" +version = "0.10.2" +description = "Python Library for Tom's Obvious, Minimal Language" +optional = false +python-versions = ">=2.6, !=3.0.*, !=3.1.*, !=3.2.*" +files = [ + {file = "toml-0.10.2-py2.py3-none-any.whl", hash = "sha256:806143ae5bfb6a3c6e736a764057db0e6a0e05e338b5630894a5f779cabb4f9b"}, + {file = "toml-0.10.2.tar.gz", hash = "sha256:b3bda1d108d5dd99f4a20d24d9c348e91c4db7ab1b749200bded2f839ccbe68f"}, +] + [[package]] name = "tomli" version = "2.0.1" @@ -3928,4 +4160,4 @@ collate = ["cytominer-database"] [metadata] lock-version = "2.0" python-versions = ">=3.8,<4.0" -content-hash = "dd279d4593b91fec84eff8a726b32a4b9fd8b3e80d21b0f2b3d179c8b2803d68" +content-hash = "70a56ee15e4d18ef704d2dde4ac560fd4089a00a75df0ef9650126aec0e75f27" diff --git a/pyproject.toml b/pyproject.toml index a431d8ae..7452a30b 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -88,6 +88,7 @@ dunamai = "^1.19.0" mkdocs-material = "^9.5.19" mkdocstrings = {extras = ["python"], version = "^0.23.0"} mkdocs-include-markdown-plugin = "^6.0.5" +mkdocs-jupyter = "^0.24.7" [tool.poetry-dynamic-versioning] enable = false diff --git a/walkthroughs/nbconverted/single_cell_usage.py b/walkthroughs/nbconverted/single_cell_usage.py deleted file mode 100644 index c21c8691..00000000 --- a/walkthroughs/nbconverted/single_cell_usage.py +++ /dev/null @@ -1,242 +0,0 @@ -#!/usr/bin/env python - -# # Single-cell Profiling Walkthrough -# -# Welcome to this walkthrough where we will guide you through the process of extracting single cell morphology features using the [pycytominer](https://github.com/cytomining/pycytominer) API. -# -# For this walkthrough, we will be working with the NF1-Schwann cell morphology dataset. -# If you would like the more information about this dataset, you can refer to this [repository](https://github.com/WayScience/nf1_cellpainting_data) -# -# From the mentioned repo, we specifically used this [dataset](https://github.com/WayScience/nf1_cellpainting_data/tree/main/2.cellprofiler_analysis/analysis_output) and the associated [metadata](https://github.com/WayScience/nf1_cellpainting_data/tree/main/0.download_data/metadata) to generate the walkthrough. -# -# -# Let's get started with the walkthrough below! - -# In[1]: - - -import pathlib - -# ignore mix type warnings from pandas -import warnings - -import pandas as pd - -from pycytominer import annotate, feature_select, normalize - -# pycytominer imports -from pycytominer.cyto_utils.cells import SingleCells - -warnings.filterwarnings("ignore") - - -# ## About the inputs -# -# -# In this section, we will set up the expected input and output paths that will be generated throughout this walkthrough. Let's take a look at the explanation of these inputs and outputs. -# -# For this workflow, we have two main inputs: -# -# - **plate_data** (SQLite file): This contains the quantified single-cell morphology features that we'll be working with. -# - **plate_map** (CSV file): This contains additional information related to the cells, providing valuable context of our single-cell morphology dataset. -# -# Now, let's explore the outputs generated in this workflow. In this walkthrough, we will be generating four profiles: -# -# - **sc_profiles_path**: This refers to the single-cell morphology profile that will be generated. -# - **anno_profiles_path**: This corresponds to the annotated single-cell morphology profile, meaning the information from the **plate_map** is added to each single-cell -# - **norm_profiles_path**: This represents the normalized single-cell morphology profile. -# - **feat_profiles_path**: Lastly, this refers to the selected features from the single-cell morphology profile. -# -# **Note**: All profiles are outputted as `.csv` files, however, users can output these files as `parquet` or other file file formats. -# For more information about output formats, please refer the documentation [here](https://pycytominer.readthedocs.io/en/latest/pycytominer.cyto_utils.html#pycytominer.cyto_utils.output.output) -# -# These profiles will serve as important outputs that will help us analyze and interpret the single-cell morphology data effectively. Now that we have a clear understanding of the inputs and outputs, let's proceed further in our walkthrough. - -# In[2]: - - -# Setting file paths -data_dir = pathlib.Path("./data/").resolve(strict=True) -metadata_dir = (data_dir / "metadata").resolve(strict=True) -out_dir = pathlib.Path("results") -out_dir.mkdir(exist_ok=True) - -# input file paths -plate_data = pathlib.Path("./data/nf1_data.sqlite").resolve(strict=True) -plate_map = (metadata_dir / "platemap_NF1_CP.csv").resolve(strict=True) - -# setting output paths -sc_profiles_path = out_dir / "nf1_single_cell_profile.csv.gz" -anno_profiles_path = out_dir / "nf1_annotated_profile.csv.gz" -norm_profiles_path = out_dir / "nf1_normalized_profile.csv.gz" -feat_profiles_path = out_dir / "nf1_features_profile.csv.gz" - - -# ## Generating Merged Single-cell Morphology Dataset -# -# In this section of the walkthrough, our goal is to load the NF1 dataset and create a merged single-cell morphology dataset. -# -# Currently, the NF1 morphology `SQLite` dataset was generated by using [CellProfiler's](https://github.com/CellProfiler/CellProfiler) [ExportToDatabase](https://cellprofiler-manual.s3.amazonaws.com/CellProfiler-4.2.5/modules/fileprocessing.html?highlight=exporttodatabase#module-cellprofiler.modules.exporttodatabase) function, where each table represents a different compartment, such as Image, Cell, Nucleus, and Cytoplasm. -# -# To achieve this, we will utilize the `SingleCells` class, which offers a range of functionalities specifically designed for single-cell analysis. You can find detailed documentation on these functionalities [here](https://pycytominer.readthedocs.io/en/latest/pycytominer.cyto_utils.html#pycytominer.cyto_utils.cells.SingleCells). -# -# However, for our purpose in this walkthrough, we will focus on using the `SingleCells` class to merge all the tables within the NF1 sqlite file into a merged single-cell morphology dataset. -# -# ### Updating defaults -# Before we proceed further, it is important to update the default parameters in the `SingleCells`class to accommodate the table name changes in our NF1 dataset. -# -# Since the table names in our NF1 dataset differ from the default table names recognized by the `SingleCells` class, we need to make adjustments to ensure proper recognition of these table name changes. - -# In[3]: - - -# update compartment names and strata -strata = ["Image_Metadata_Well", "Image_Metadata_Plate"] -compartments = ["Per_Cells", "Per_Cytoplasm", "Per_Nuclei"] - -# Updating linking columns for merging all compartments -linking_cols = { - "Per_Cytoplasm": { - "Per_Cells": "Cytoplasm_Parent_Cells", - "Per_Nuclei": "Cytoplasm_Parent_Nuclei", - }, - "Per_Cells": {"Per_Cytoplasm": "Cells_Number_Object_Number"}, - "Per_Nuclei": {"Per_Cytoplasm": "Nuclei_Number_Object_Number"}, -} - - -# Now that we have stored the updated parameters, we can use them as inputs for `SingleCells` class to merge all the NF1 sqlite tables into a single consolidated dataset. -# -# This is done through the `merge_single_cells` method. For more infromation about `merge_single_cells` please refer to the documentation [here](https://pycytominer.readthedocs.io/en/latest/pycytominer.cyto_utils.html#pycytominer.cyto_utils.cells.SingleCells.merge_single_cells) - -# In[4]: - - -# setting up sqlite address -sqlite_address = f"sqlite:///{plate_data!s}" - -# loading single cell morphology data into pycyotminer's SingleCells Object -single_cell_profile = SingleCells( - sql_file=sqlite_address, - compartments=compartments, - compartment_linking_cols=linking_cols, - image_table_name="Per_Image", - strata=strata, - merge_cols=["ImageNumber"], - image_cols="ImageNumber", - load_image_data=True, -) - -# merging all sqlite table into a single tabular dataset (csv) and save as -# compressed csv file -single_cell_profile.merge_single_cells( - sc_output_file=sc_profiles_path, compression_options="gzip" -) - - -# Now that we have created our merged single-cell profile, let's move on to the next step: loading our `platemaps`. -# -# `Platemaps` provide us with additional information that is crucial for our analysis. They contain details such as well positions, genotypes, gene names, perturbation types, and more. In other words, platemaps serve as a valuable source of metadata for our single-cell morphology profile. -# - -# In[5]: - - -# loading plate map and display it -platemap_df = pd.read_csv(plate_map) - -# displaying platemap -print(platemap_df.columns.tolist()) - - -# ## Annotation -# -# In this step of the walkthrough, we will combine the metadata with the merged single-cell morphology dataset. To accomplish this, we will utilize the `annotation` function provided by `pycytominer`. -# -# The `annotation` function takes two inputs: the merged single-cell morphology dataset and its associated plate map. By combining these two datasets, we will generate an annotated_profile that contains enriched information. -# -# More information about the `annotation` function can be found [here](https://pycytominer.readthedocs.io/en/latest/pycytominer.html#module-pycytominer.annotate) -# - -# In[6]: - - -# annotating merged single-cell profile with metadata -annotate( - profiles=sc_profiles_path, - platemap=platemap_df, - join_on=["Metadata_well_position", "Image_Metadata_Well"], - output_file=anno_profiles_path, - compression_options="gzip", -) - -# save message display -print(f"Annotated profile saved in: {anno_profiles_path}") - - -# ## Normalization Step -# -# The next step is to normalize our dataset using the `normalize` function provided by `pycytominer`. -# More information regards `pycytominer`'s `normalize` function can be found [here](https://pycytominer.readthedocs.io/en/latest/pycytominer.html#module-pycytominer.normalize) -# -# Normalization is a critical preprocessing step that improves the quality of our dataset. It addresses two key challenges: mitigating the impact of outliers and handling variations in value scales. By normalizing the data, we ensure that our downstream analysis is not heavily influenced by these factors. -# -# Additionally, normalization plays a crucial role in determining feature importance (which is crucial for our last step). By bringing all features to a comparable scale, it enables the identification of important features without biases caused by outliers or widely-scaled values. -# -# To normalize our annotated single-cell morphology profile, we will utilize the `normalize` function from `pycytominer`. This function is specifically designed to handle the normalization process for cytometry data. - -# In[7]: - - -# normalize dataset -normalize( - profiles=anno_profiles_path, - features="infer", - image_features=False, - meta_features="infer", - samples="all", - method="standardize", - output_file=norm_profiles_path, - compression_options="gzip", -) - -# save message display -print(f"Normalized profile saved in: {norm_profiles_path}") - - -# ## Feature Selection -# -# -# In the final section of our walkthrough, we will utilize the normalized dataset to extract important morphological features and generate a selected features profile. -# -# To accomplish this, we will make use of the `feature_select` function provided by `pycytominer`. -# Using `pycytominer`'s `feature_select` function to our dataset, we can identify the most informative morphological features that contribute significantly to the variations observed in our data. These selected features will be utilized to create our feature profile. -# -# For more detailed information about the `feature_select` function, its parameters, and its capabilities, please refer to the documentation available [here](https://pycytominer.readthedocs.io/en/latest/pycytominer.html#module-pycytominer.feature_select). -# - -# In[8]: - - -# creating selected features profile -feature_select( - profiles=norm_profiles_path, - features="infer", - image_features=False, - samples="all", - operation=["variance_threshold", "correlation_threshold", "blocklist"], - output_file=feat_profiles_path, - compression_options="gzip", -) - -# save message display -print(f"Selected features profile saved in: {feat_profiles_path}") - - -# Congratulations! You have successfully completed our walkthrough. We hope that it has provided you with a basic understanding of how to analyze cell morphology features using `pycytominer`. -# -# By following the steps outlined in this walkthrough, you have gained valuable insights into processing high-dimensional single-cell morphology data with ease using `pycytominer`. However, please keep in mind that `pycytominer` offers a wide range of functionalities beyond what we covered here. We encourage you to explore the documentation to discover more advanced features and techniques. -# -# If you have any questions or need further assistance, don't hesitate to visit the `pycytominer` repository and post your question in the [issues](https://github.com/cytomining/pycytominer/issues) section. The community is there to support you and provide guidance. -# -# Now that you have the knowledge and tools to analyze cell morphology features, have fun exploring and mining your data!