From 05f971a0617f886e09daa0a7eaf42e219d4dbad5 Mon Sep 17 00:00:00 2001 From: Vladimir Shitov <35199218+VladimirShitov@users.noreply.github.com> Date: Fri, 25 Aug 2023 17:25:39 +0200 Subject: [PATCH] Fix typos (#48) * Correct indexing of modalities * Fix order of words and articles * Fix typo: involes -> involves * Add space * Fix typo: howver -> however * Fix typo: mehods -> methods --- fundamentals/architecture.qmd | 8 ++++---- fundamentals/concepts.qmd | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/fundamentals/architecture.qmd b/fundamentals/architecture.qmd index 743b6914..85923d87 100644 --- a/fundamentals/architecture.qmd +++ b/fundamentals/architecture.qmd @@ -14,7 +14,7 @@ flowchart TD 1. [Ingestion](#ingestion): Convert raw sequencing data or count tables into MuData data for further processing. 2. [Splitting modalities](#sec-splitting): Creating several MuData objects, one per modality, out of a multimodal input sample. -3. [Unimodal Single Sample Processing](#sec-single-sample): tools applied to each modality of samples individually. Mostly involes the selection of true from false cells. +3. [Unimodal Single Sample Processing](#sec-single-sample): tools applied to each modality of samples individually. Mostly involves the selection of true from false cells. 4. [Unimodal Multi Sample Processing](#sec-multisample-processing): steps that require information from all samples together. Processing is still performed per-modality. 5. [Merging](#sec-merging): Creating one MuData object from several unimodal MuData input files. 6. [Initializing Integration](#sec-initializing-integration): Performs dimensionality reduction and cell type clustering on non-integrated samples. These are popular steps that would otherwise be executed manually or they provide input for downstream integration methods. @@ -341,7 +341,7 @@ In order to perform demultiplexing, several tools have been made available in th * [BCL Convert](../components/modules/demux/bcl_convert.qmd): general demultiplexing software by Illumina. * Cellranger's [mkfastq](../components/modules/demux/cellranger_mkfastq.qmd): a wrapper around BCL Convert that provides extra convenience features for the processing of 10X single-cell data. -The alignment of reads from the FASTQ files to an appropriate genome reference is called mapping. The result of the mapping process are tables that count the number of times a read has been mapped to a certain feature and metadata information for the cells (observations) and features. There are different format that can be used to store this information together. Since OpenPipeline uses [MuData](./concepts.qmd#sec-common-file-format) as a common file format throughout its pipelines, a conversion to MuData is included in the mapping pipelines.The choice between workflows for mapping is dependant on your single-cell library provider and technology: +The alignment of reads from the FASTQ files to an appropriate genome reference is called mapping. The result of the mapping process are tables that count the number of times a read has been mapped to a certain feature and metadata information for the cells (observations) and features. There are different format that can be used to store this information together. Since OpenPipeline uses [MuData](./concepts.qmd#sec-common-file-format) as a common file format throughout its pipelines, a conversion to MuData is included in the mapping pipelines. The choice between workflows for mapping is dependant on your single-cell library provider and technology: * For DB Genomics libraries, the [BD Rhapsody](../components/workflows/ingestion/bd_rhapsody.qmd) pipeline can be used. * For 10X based libraries, either [cellranger count](../components/workflows/ingestion/cellranger_mapping.qmd) or [cellranger multi](../components/workflows/ingestion/cellranger_multi.qmd) is provided. For more information about the differences between the two and when to use which mapping software, please consult the [10X genomics website](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/multi#when-to-use-multi). @@ -409,7 +409,7 @@ The removal of cells based on basic count statistics is split up into two parts: Flagging cells for removal involved adding a boolean column to the `.obs` dataframe. After the cells have been flagged for removal, the cells are actually filtered using [do_filter](../components/modules/filter/do_filter.qmd), which reads the values in `.obs` and removed the cells labeled `True`. This applies the general phylosophy of "separation of concerns": one component is responsible for labeling the cells, another for removing them. This keeps the codebase for a single component small and its functionality testable. -The next and final step in the single-sample gene expression processing is doublet detection using [filter_with_scrublet](../components/modules/filter/filter_with_scrublet.qmd). Like `filter_with_counts`, it will not remove cells but add a column to `.obs` (which have the name `filter_with_scrublet` by default). The single-sample GEX workflow will not remove not be removed during the processing (hence no `do_filter`). Howver, you can choose to remove them yourself before doing your analyses by applying a filter with the column in `.obs` yourself. +The next and final step in the single-sample gene expression processing is doublet detection using [filter_with_scrublet](../components/modules/filter/filter_with_scrublet.qmd). Like `filter_with_counts`, it will not remove cells but add a column to `.obs` (which have the name `filter_with_scrublet` by default). The single-sample GEX workflow will not remove not be removed during the processing (hence no `do_filter`). However, you can choose to remove them yourself before doing your analyses by applying a filter with the column in `.obs` yourself. ~~~{.d2 layout=elk} direction: right @@ -687,7 +687,7 @@ style: { ~~~ -## Integration Mehods {#sec-integration-methods} +## Integration Methods {#sec-integration-methods} Integration is the alignment of cell types across samples. There exist three different types of integration methods, based on the degree of integration across modalities: 1. Unimodal integration across batches. For example: [scVI](../components/modules/integrate/scvi.qmd), [scanorama](../components/modules/integrate/scanorama.qmd), [harmony](../components/modules/integrate/harmonypy.qmd) diff --git a/fundamentals/concepts.qmd b/fundamentals/concepts.qmd index c72d3e3e..d9602c0b 100644 --- a/fundamentals/concepts.qmd +++ b/fundamentals/concepts.qmd @@ -51,7 +51,7 @@ MuData │ ├─ .obsm │ ├─ .varm │ ├─ .uns -│ ├─ modality_1 (AnnData Object) +│ ├─ modality_2 (AnnData Object) ├─ .var ├─ .obs ├─ .obms @@ -63,8 +63,8 @@ MuData * `.X` and `.layers`: matrices storing the measurements with the columns being the variables measured and the rows being the observations (cells in most cases). * `.var`: metadata for the variables (i.e. annotation for the columns of `.X` or any matrix in `.layers`). The number of rows in the .var datafame (or the length of each entry in the dictionairy) is equal to the number of columns in the measurement matrices. * `.obs`: metadata for the observations (i.e. annotation for the rows of `.X` or any matrix in `.layers`). The number of rows in the .obs datafame (or the length of each entry in the dictionairy) is equal to the number of rows in the measurement matrices. -* `varm`: multi-dimensional the variable annotation. A key-dataframe mapping where the number of rows in each dataframe is equal to the number of columns in the measurement matrices. -* `obsm`: multi-dimensional the observation annotation. A key-dataframe mapping where the number of rows in each dataframe is equal to the number of rows in the measurement matrices. +* `varm`: the multi-dimensional variable annotation. A key-dataframe mapping where the number of rows in each dataframe is equal to the number of columns in the measurement matrices. +* `obsm`: the multi-dimensional observation annotation. A key-dataframe mapping where the number of rows in each dataframe is equal to the number of rows in the measurement matrices. * `.uns`: A mapping where no restrictions are enforced on the dimensions of the data. # Modularity and a language independent framework 🔳 @@ -73,4 +73,4 @@ TODO # A graphical interface 📺 -TODO \ No newline at end of file +TODO