diff --git a/components/modules/annotate/popv.qmd b/components/modules/annotate/popv.qmd index 17d028e7..341c6d24 100644 --- a/components/modules/annotate/popv.qmd +++ b/components/modules/annotate/popv.qmd @@ -127,9 +127,9 @@ Output arguments. Other arguments. -|Name |Description |Attributes | -|:-----------|:-----------------------------------------------------------------------|:-----------------------------------------------------------------| -|`--methods` |Methods to call cell types. By default, runs to knn_on_scvi and scanvi. |`string`, required, example: `"knn_on_scvi"`, example: `"scanvi"` | +|Name |Description |Attributes | +|:-----------|:-----------------------------------------------------------------------|:-----------------------------------------------------------------------------------| +|`--methods` |Methods to call cell types. By default, runs to knn_on_scvi and scanvi. |List of `string`, required, example: `"knn_on_scvi", "scanvi"`, multiple_sep: `":"` | ## Authors diff --git a/components/modules/cluster/leiden.qmd b/components/modules/cluster/leiden.qmd index 9b37a1ed..361f7aaf 100644 --- a/components/modules/cluster/leiden.qmd +++ b/components/modules/cluster/leiden.qmd @@ -80,15 +80,15 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Arguments -|Name |Description |Attributes | -|:-----------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------| -|`--input` |Input file. |`file`, required, example: `"input.h5mu"` | -|`--modality` | |`string`, default: `"rna"` | -|`--obsp_connectivities` |In which .obsp slot the neighbor connectivities can be found. |`string`, default: `"connectivities"` | -|`--output` |Output file. |`file`, required, example: `"output.h5mu"` | -|`--output_compression` | |`string`, example: `"gzip"` | -|`--obsm_name` |Name of the .obsm key under which to add the cluster labels. The name of the columns in the matrix will correspond to the resolutions. |`string`, default: `"leiden"` | -|`--resolution` |A parameter value controlling the coarseness of the clustering. Higher values lead to more clusters. Multiple values will result in clustering being performed multiple times. |`double`, default: `1` | +|Name |Description |Attributes | +|:-----------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------| +|`--input` |Input file. |`file`, required, example: `"input.h5mu"` | +|`--modality` | |`string`, default: `"rna"` | +|`--obsp_connectivities` |In which .obsp slot the neighbor connectivities can be found. |`string`, default: `"connectivities"` | +|`--output` |Output file. |`file`, required, example: `"output.h5mu"` | +|`--output_compression` | |`string`, example: `"gzip"` | +|`--obsm_name` |Name of the .obsm key under which to add the cluster labels. The name of the columns in the matrix will correspond to the resolutions. |`string`, default: `"leiden"` | +|`--resolution` |A parameter value controlling the coarseness of the clustering. Higher values lead to more clusters. Multiple values will result in clustering being performed multiple times. |List of `double`, default: `1`, multiple_sep: `":"` | ## Authors diff --git a/components/modules/convert/from_h5ad_to_h5mu.qmd b/components/modules/convert/from_h5ad_to_h5mu.qmd index 472ced4a..ea6b4e9d 100644 --- a/components/modules/convert/from_h5ad_to_h5mu.qmd +++ b/components/modules/convert/from_h5ad_to_h5mu.qmd @@ -68,12 +68,12 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Arguments -|Name |Description |Attributes | -|:----------------------|:-------------------|:-----------------------------------------| -|`--input` |Input h5ad files |`file`, required, default: `"input.h5ad"` | -|`--modality` | |`string`, default: `"rna"` | -|`--output` |Output MuData file. |`file`, default: `"output.h5mu"` | -|`--output_compression` | |`string`, example: `"gzip"` | +|Name |Description |Attributes | +|:----------------------|:-------------------|:----------------------------------------------------------------------| +|`--input` |Input h5ad files |List of `file`, required, default: `"input.h5ad"`, multiple_sep: `":"` | +|`--modality` | |List of `string`, default: `"rna"`, multiple_sep: `":"` | +|`--output` |Output MuData file. |`file`, default: `"output.h5mu"` | +|`--output_compression` | |`string`, example: `"gzip"` | ## Authors diff --git a/components/modules/correction/cellbender_remove_background.qmd b/components/modules/correction/cellbender_remove_background.qmd index 74832e42..0b452614 100644 --- a/components/modules/correction/cellbender_remove_background.qmd +++ b/components/modules/correction/cellbender_remove_background.qmd @@ -139,35 +139,35 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Arguments -|Name |Description |Attributes | -|:-------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------| -|`--expected_cells_from_qc` |Will use the Cell Ranger QC to determine the estimated number of cells |`boolean`, default: `FALSE` | -|`--expected_cells` |Number of cells expected in the dataset (a rough estimate within a factor of 2 is sufficient). |`integer`, example: `1000` | -|`--total_droplets_included` |The number of droplets from the rank-ordered UMI plot that will have their cell probabilities inferred as an output. Include the droplets which might contain cells. Droplets beyond TOTAL_DROPLETS_INCLUDED should be 'surely empty' droplets. |`integer`, example: `25000` | -|`--force_cell_umi_prior` |Ignore CellBender's heuristic prior estimation, and use this prior for UMI counts in cells. |`integer` | -|`--force_empty_umi_prior` |Ignore CellBender's heuristic prior estimation, and use this prior for UMI counts in empty droplets. |`integer` | -|`--model` |Which model is being used for count data. * 'naive' subtracts the estimated ambient profile. * 'simple' does not model either ambient RNA or random barcode swapping (for debugging purposes -- not recommended). * 'ambient' assumes background RNA is incorporated into droplets. * 'swapping' assumes background RNA comes from random barcode swapping (via PCR chimeras). * 'full' uses a combined ambient and swapping model. |`string`, default: `"full"` | -|`--epochs` |Number of epochs to train. |`integer`, default: `150` | -|`--low_count_threshold` |Droplets with UMI counts below this number are completely excluded from the analysis. This can help identify the correct prior for empty droplet counts in the rare case where empty counts are extremely high (over 200). |`integer`, default: `5` | -|`--z_dim` |Dimension of latent variable z. |`integer`, default: `64` | -|`--z_layers` |Dimension of hidden layers in the encoder for z. |`integer`, default: `512` | -|`--training_fraction` |Training detail: the fraction of the data used for training. The rest is never seen by the inference algorithm. Speeds up learning. |`double`, default: `0.9` | -|`--empty_drop_training_fraction` |Training detail: the fraction of the training data each epoch that is drawn (randomly sampled) from surely empty droplets. |`double`, default: `0.2` | -|`--ignore_features` |Integer indices of features to ignore entirely. In the output count matrix, the counts for these features will be unchanged. |`integer` | -|`--fpr` |Target 'delta' false positive rate in [0, 1). Use 0 for a cohort of samples which will be jointly analyzed for differential expression. A false positive is a true signal count that is erroneously removed. More background removal is accompanied by more signal removal at high values of FPR. You can specify multiple values, which will create multiple output files. |`double`, default: `0.01` | -|`--exclude_feature_types` |Feature types to ignore during the analysis. These features will be left unchanged in the output file. |`string` | -|`--projected_ambient_count_threshold` |Controls how many features are included in the analysis, which can lead to a large speedup. If a feature is expected to have less than PROJECTED_AMBIENT_COUNT_THRESHOLD counts total in all cells (summed), then that gene is excluded, and it will be unchanged in the output count matrix. For example, PROJECTED_AMBIENT_COUNT_THRESHOLD = 0 will include all features which have even a single count in any empty droplet. |`double`, default: `0.1` | -|`--learning_rate` |Training detail: lower learning rate for inference. A OneCycle learning rate schedule is used, where the upper learning rate is ten times this value. (For this value, probably do not exceed 1e-3). |`double`, default: `1e-04` | -|`--final_elbo_fail_fraction` |Training is considered to have failed if (best_test_ELBO - final_test_ELBO)/(best_test_ELBO - initial_test_ELBO) > FINAL_ELBO_FAIL_FRACTION. Training will automatically re-run if --num-training-tries > 1. By default, will not fail training based on final_training_ELBO. |`double` | -|`--epoch_elbo_fail_fraction` |Training is considered to have failed if (previous_epoch_test_ELBO - current_epoch_test_ELBO)/(previous_epoch_test_ELBO - initial_train_ELBO) > EPOCH_ELBO_FAIL_FRACTION. Training will automatically re-run if --num-training-tries > 1. By default, will not fail training based on epoch_training_ELBO. |`double` | -|`--num_training_tries` |Number of times to attempt to train the model. At each subsequent attempt, the learning rate is multiplied by LEARNING_RATE_RETRY_MULT. |`integer`, default: `1` | -|`--learning_rate_retry_mult` |Learning rate is multiplied by this amount each time a new training attempt is made. (This parameter is only used if training fails based on EPOCH_ELBO_FAIL_FRACTION or FINAL_ELBO_FAIL_FRACTION and NUM_TRAINING_TRIES is > 1.) |`double`, default: `0.2` | -|`--posterior_batch_size` |Training detail: size of batches when creating the posterior. Reduce this to avoid running out of GPU memory creating the posterior (will be slower). |`integer`, default: `128` | -|`--posterior_regulation` |Posterior regularization method. (For experts: not required for normal usage, see documentation). * PRq is approximate quantile-targeting. * PRmu is approximate mean-targeting aggregated over genes (behavior of v0.2.0). * PRmu_gene is approximate mean-targeting per gene. |`string` | -|`--alpha` |Tunable parameter alpha for the PRq posterior regularization method (not normally used: see documentation). |`double` | -|`--q` |Tunable parameter q for the CDF threshold estimation method (not normally used: see documentation). |`double` | -|`--estimator` |Output denoised count estimation method. (For experts: not required for normal usage, see documentation). |`string`, default: `"mckp"` | -|`--estimator_multiple_cpu` |Including the flag --estimator-multiple-cpu will use more than one CPU to compute the MCKP output count estimator in parallel (does nothing for other estimators). |`boolean_true` | -|`--constant_learning_rate` |Including the flag --constant-learning-rate will use the ClippedAdam optimizer instead of the OneCycleLR learning rate schedule, which is the default. Learning is faster with the OneCycleLR schedule. However, training can easily be continued from a checkpoint for more epochs than the initial command specified when using ClippedAdam. On the other hand, if using the OneCycleLR schedule with 150 epochs specified, it is not possible to pick up from that final checkpoint and continue training until 250 epochs. |`boolean` | -|`--debug` |Including the flag --debug will log extra messages useful for debugging. |`boolean_true` | -|`--cuda` |Including the flag --cuda will run the inference on a GPU. |`boolean_true` | +|Name |Description |Attributes | +|:-------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------| +|`--expected_cells_from_qc` |Will use the Cell Ranger QC to determine the estimated number of cells |`boolean`, default: `FALSE` | +|`--expected_cells` |Number of cells expected in the dataset (a rough estimate within a factor of 2 is sufficient). |`integer`, example: `1000` | +|`--total_droplets_included` |The number of droplets from the rank-ordered UMI plot that will have their cell probabilities inferred as an output. Include the droplets which might contain cells. Droplets beyond TOTAL_DROPLETS_INCLUDED should be 'surely empty' droplets. |`integer`, example: `25000` | +|`--force_cell_umi_prior` |Ignore CellBender's heuristic prior estimation, and use this prior for UMI counts in cells. |`integer` | +|`--force_empty_umi_prior` |Ignore CellBender's heuristic prior estimation, and use this prior for UMI counts in empty droplets. |`integer` | +|`--model` |Which model is being used for count data. * 'naive' subtracts the estimated ambient profile. * 'simple' does not model either ambient RNA or random barcode swapping (for debugging purposes -- not recommended). * 'ambient' assumes background RNA is incorporated into droplets. * 'swapping' assumes background RNA comes from random barcode swapping (via PCR chimeras). * 'full' uses a combined ambient and swapping model. |`string`, default: `"full"` | +|`--epochs` |Number of epochs to train. |`integer`, default: `150` | +|`--low_count_threshold` |Droplets with UMI counts below this number are completely excluded from the analysis. This can help identify the correct prior for empty droplet counts in the rare case where empty counts are extremely high (over 200). |`integer`, default: `5` | +|`--z_dim` |Dimension of latent variable z. |`integer`, default: `64` | +|`--z_layers` |Dimension of hidden layers in the encoder for z. |List of `integer`, default: `512`, multiple_sep: `":"` | +|`--training_fraction` |Training detail: the fraction of the data used for training. The rest is never seen by the inference algorithm. Speeds up learning. |`double`, default: `0.9` | +|`--empty_drop_training_fraction` |Training detail: the fraction of the training data each epoch that is drawn (randomly sampled) from surely empty droplets. |`double`, default: `0.2` | +|`--ignore_features` |Integer indices of features to ignore entirely. In the output count matrix, the counts for these features will be unchanged. |List of `integer`, multiple_sep: `":"` | +|`--fpr` |Target 'delta' false positive rate in [0, 1). Use 0 for a cohort of samples which will be jointly analyzed for differential expression. A false positive is a true signal count that is erroneously removed. More background removal is accompanied by more signal removal at high values of FPR. You can specify multiple values, which will create multiple output files. |List of `double`, default: `0.01`, multiple_sep: `":"` | +|`--exclude_feature_types` |Feature types to ignore during the analysis. These features will be left unchanged in the output file. |List of `string`, multiple_sep: `":"` | +|`--projected_ambient_count_threshold` |Controls how many features are included in the analysis, which can lead to a large speedup. If a feature is expected to have less than PROJECTED_AMBIENT_COUNT_THRESHOLD counts total in all cells (summed), then that gene is excluded, and it will be unchanged in the output count matrix. For example, PROJECTED_AMBIENT_COUNT_THRESHOLD = 0 will include all features which have even a single count in any empty droplet. |`double`, default: `0.1` | +|`--learning_rate` |Training detail: lower learning rate for inference. A OneCycle learning rate schedule is used, where the upper learning rate is ten times this value. (For this value, probably do not exceed 1e-3). |`double`, default: `1e-04` | +|`--final_elbo_fail_fraction` |Training is considered to have failed if (best_test_ELBO - final_test_ELBO)/(best_test_ELBO - initial_test_ELBO) > FINAL_ELBO_FAIL_FRACTION. Training will automatically re-run if --num-training-tries > 1. By default, will not fail training based on final_training_ELBO. |`double` | +|`--epoch_elbo_fail_fraction` |Training is considered to have failed if (previous_epoch_test_ELBO - current_epoch_test_ELBO)/(previous_epoch_test_ELBO - initial_train_ELBO) > EPOCH_ELBO_FAIL_FRACTION. Training will automatically re-run if --num-training-tries > 1. By default, will not fail training based on epoch_training_ELBO. |`double` | +|`--num_training_tries` |Number of times to attempt to train the model. At each subsequent attempt, the learning rate is multiplied by LEARNING_RATE_RETRY_MULT. |`integer`, default: `1` | +|`--learning_rate_retry_mult` |Learning rate is multiplied by this amount each time a new training attempt is made. (This parameter is only used if training fails based on EPOCH_ELBO_FAIL_FRACTION or FINAL_ELBO_FAIL_FRACTION and NUM_TRAINING_TRIES is > 1.) |`double`, default: `0.2` | +|`--posterior_batch_size` |Training detail: size of batches when creating the posterior. Reduce this to avoid running out of GPU memory creating the posterior (will be slower). |`integer`, default: `128` | +|`--posterior_regulation` |Posterior regularization method. (For experts: not required for normal usage, see documentation). * PRq is approximate quantile-targeting. * PRmu is approximate mean-targeting aggregated over genes (behavior of v0.2.0). * PRmu_gene is approximate mean-targeting per gene. |`string` | +|`--alpha` |Tunable parameter alpha for the PRq posterior regularization method (not normally used: see documentation). |`double` | +|`--q` |Tunable parameter q for the CDF threshold estimation method (not normally used: see documentation). |`double` | +|`--estimator` |Output denoised count estimation method. (For experts: not required for normal usage, see documentation). |`string`, default: `"mckp"` | +|`--estimator_multiple_cpu` |Including the flag --estimator-multiple-cpu will use more than one CPU to compute the MCKP output count estimator in parallel (does nothing for other estimators). |`boolean_true` | +|`--constant_learning_rate` |Including the flag --constant-learning-rate will use the ClippedAdam optimizer instead of the OneCycleLR learning rate schedule, which is the default. Learning is faster with the OneCycleLR schedule. However, training can easily be continued from a checkpoint for more epochs than the initial command specified when using ClippedAdam. On the other hand, if using the OneCycleLR schedule with 150 epochs specified, it is not possible to pick up from that final checkpoint and continue training until 250 epochs. |`boolean` | +|`--debug` |Including the flag --debug will log extra messages useful for debugging. |`boolean_true` | +|`--cuda` |Including the flag --cuda will run the inference on a GPU. |`boolean_true` | diff --git a/components/modules/correction/cellbender_remove_background_v0_2.qmd b/components/modules/correction/cellbender_remove_background_v0_2.qmd index c8e658c9..1d117f4f 100644 --- a/components/modules/correction/cellbender_remove_background_v0_2.qmd +++ b/components/modules/correction/cellbender_remove_background_v0_2.qmd @@ -119,19 +119,19 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Arguments -|Name |Description |Attributes | -|:--------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------| -|`--expected_cells` |Number of cells expected in the dataset (a rough estimate within a factor of 2 is sufficient). |`integer`, example: `1000` | -|`--total_droplets_included` |The number of droplets from the rank-ordered UMI plot that will be analyzed. The largest 'total_droplets' droplets will have their cell probabilities inferred as an output. |`integer`, example: `25000` | -|`--expected_cells_from_qc` |Will use the Cell Ranger QC to determine the estimated number of cells |`boolean`, default: `TRUE` | -|`--model` |Which model is being used for count data. 'simple' does not model either ambient RNA or random barcode swapping (for debugging purposes -- not recommended). 'ambient' assumes background RNA is incorporated into droplets. 'swapping' assumes background RNA comes from random barcode swapping. 'full' uses a combined ambient and swapping model. |`string`, default: `"full"` | -|`--epochs` |Number of epochs to train. |`integer`, default: `150` | -|`--low_count_threshold` |Droplets with UMI counts below this number are completely excluded from the analysis. This can help identify the correct prior for empty droplet counts in the rare case where empty counts are extremely high (over 200). |`integer`, default: `15` | -|`--z_dim` |Dimension of latent variable z. |`integer`, default: `100` | -|`--z_layers` |Dimension of hidden layers in the encoder for z. |`integer`, default: `500` | -|`--training_fraction` |Training detail: the fraction of the data used for training. The rest is never seen by the inference algorithm. Speeds up learning. |`double`, default: `0.9` | -|`--empty_drop_training_fraction` |Training detail: the fraction of the training data each epoch that is drawn (randomly sampled) from surely empty droplets. |`double`, default: `0.5` | -|`--fpr` |Target false positive rate in (0, 1). A false positive is a true signal count that is erroneously removed. More background removal is accompanied by more signal removal at high values of FPR. You can specify multiple values, which will create multiple output files. |`double`, default: `0.01` | -|`--exclude_antibody_capture` |Including the flag --exclude-antibody-capture will cause remove-background to operate on gene counts only, ignoring other features. |`boolean_true` | -|`--learning_rate` |Training detail: lower learning rate for inference. A OneCycle learning rate schedule is used, where the upper learning rate is ten times this value. (For this value, probably do not exceed 1e-3). |`double`, example: `1e-04` | -|`--cuda` |Including the flag --cuda will run the inference on a GPU. |`boolean_true` | +|Name |Description |Attributes | +|:--------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------| +|`--expected_cells` |Number of cells expected in the dataset (a rough estimate within a factor of 2 is sufficient). |`integer`, example: `1000` | +|`--total_droplets_included` |The number of droplets from the rank-ordered UMI plot that will be analyzed. The largest 'total_droplets' droplets will have their cell probabilities inferred as an output. |`integer`, example: `25000` | +|`--expected_cells_from_qc` |Will use the Cell Ranger QC to determine the estimated number of cells |`boolean`, default: `TRUE` | +|`--model` |Which model is being used for count data. 'simple' does not model either ambient RNA or random barcode swapping (for debugging purposes -- not recommended). 'ambient' assumes background RNA is incorporated into droplets. 'swapping' assumes background RNA comes from random barcode swapping. 'full' uses a combined ambient and swapping model. |`string`, default: `"full"` | +|`--epochs` |Number of epochs to train. |`integer`, default: `150` | +|`--low_count_threshold` |Droplets with UMI counts below this number are completely excluded from the analysis. This can help identify the correct prior for empty droplet counts in the rare case where empty counts are extremely high (over 200). |`integer`, default: `15` | +|`--z_dim` |Dimension of latent variable z. |`integer`, default: `100` | +|`--z_layers` |Dimension of hidden layers in the encoder for z. |List of `integer`, default: `500`, multiple_sep: `":"` | +|`--training_fraction` |Training detail: the fraction of the data used for training. The rest is never seen by the inference algorithm. Speeds up learning. |`double`, default: `0.9` | +|`--empty_drop_training_fraction` |Training detail: the fraction of the training data each epoch that is drawn (randomly sampled) from surely empty droplets. |`double`, default: `0.5` | +|`--fpr` |Target false positive rate in (0, 1). A false positive is a true signal count that is erroneously removed. More background removal is accompanied by more signal removal at high values of FPR. You can specify multiple values, which will create multiple output files. |List of `double`, default: `0.01`, multiple_sep: `":"` | +|`--exclude_antibody_capture` |Including the flag --exclude-antibody-capture will cause remove-background to operate on gene counts only, ignoring other features. |`boolean_true` | +|`--learning_rate` |Training detail: lower learning rate for inference. A OneCycle learning rate schedule is used, where the upper learning rate is ten times this value. (For this value, probably do not exceed 1e-3). |`double`, example: `1e-04` | +|`--cuda` |Including the flag --cuda will run the inference on a GPU. |`boolean_true` | diff --git a/components/modules/dataflow/concat.qmd b/components/modules/dataflow/concat.qmd index a708274c..77890716 100644 --- a/components/modules/dataflow/concat.qmd +++ b/components/modules/dataflow/concat.qmd @@ -70,14 +70,14 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Arguments -|Name |Description |Attributes | -|:----------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------| -|`--input` |Paths to the different samples to be concatenated. |`file`, required, example: `"sample_paths"` | -|`--input_id` |Names of the different samples that have to be concatenated. Must be specified when using '--mode move'. In this case, the ids will be used for the columns names of the dataframes registring the conflicts. If specified, must be of same length as `--input`. |`string` | -|`--output` | |`file`, example: `"output.h5mu"` | -|`--output_compression` |The compression format to be used on the output h5mu object. |`string`, example: `"gzip"` | -|`--obs_sample_name` |Name of the .obs key under which to add the sample names. |`string`, default: `"sample_id"` | -|`--other_axis_mode` |How to handle the merging of other axis (var, obs, ...). - None: keep no data - same: only keep elements of the matrices which are the same in each of the samples - unique: only keep elements for which there is only 1 possible value (1 value that can occur in multiple samples) - first: keep the annotation from the first sample - only: keep elements that show up in only one of the objects (1 unique element in only 1 sample) - move: identical to 'same', but moving the conflicting values to .varm or .obsm |`string`, default: `"move"` | +|Name |Description |Attributes | +|:----------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------| +|`--input` |Paths to the different samples to be concatenated. |List of `file`, required, example: `"sample_paths"`, multiple_sep: `","` | +|`--input_id` |Names of the different samples that have to be concatenated. Must be specified when using '--mode move'. In this case, the ids will be used for the columns names of the dataframes registring the conflicts. If specified, must be of same length as `--input`. |List of `string`, multiple_sep: `","` | +|`--output` | |`file`, example: `"output.h5mu"` | +|`--output_compression` |The compression format to be used on the output h5mu object. |`string`, example: `"gzip"` | +|`--obs_sample_name` |Name of the .obs key under which to add the sample names. |`string`, default: `"sample_id"` | +|`--other_axis_mode` |How to handle the merging of other axis (var, obs, ...). - None: keep no data - same: only keep elements of the matrices which are the same in each of the samples - unique: only keep elements for which there is only 1 possible value (1 value that can occur in multiple samples) - first: keep the annotation from the first sample - only: keep elements that show up in only one of the objects (1 unique element in only 1 sample) - move: identical to 'same', but moving the conflicting values to .varm or .obsm |`string`, default: `"move"` | ## Authors diff --git a/components/modules/dataflow/merge.qmd b/components/modules/dataflow/merge.qmd index e2778171..9dd5c31b 100644 --- a/components/modules/dataflow/merge.qmd +++ b/components/modules/dataflow/merge.qmd @@ -67,11 +67,11 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Arguments -|Name |Description |Attributes | -|:----------------------|:-----------------------------------------------------------------|:-------------------------------------------| -|`--input` |Paths to the single-modality .h5mu files that need to be combined |`file`, required, default: `"sample_paths"` | -|`--output` |Path to the output file. |`file`, default: `"output.h5mu"` | -|`--output_compression` |The compression format to be used on the output h5mu object. |`string`, example: `"gzip"` | +|Name |Description |Attributes | +|:----------------------|:-----------------------------------------------------------------|:------------------------------------------------------------------------| +|`--input` |Paths to the single-modality .h5mu files that need to be combined |List of `file`, required, default: `"sample_paths"`, multiple_sep: `","` | +|`--output` |Path to the output file. |`file`, default: `"output.h5mu"` | +|`--output_compression` |The compression format to be used on the output h5mu object. |`string`, example: `"gzip"` | ## Authors diff --git a/components/modules/download/sync_test_resources.qmd b/components/modules/download/sync_test_resources.qmd index 9fe210f6..38eb7a7a 100644 --- a/components/modules/download/sync_test_resources.qmd +++ b/components/modules/download/sync_test_resources.qmd @@ -77,7 +77,7 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen |`--quiet` |Displays the operations that would be performed using the specified command without actually running them. |`boolean_true` | |`--dryrun` |Does not display the operations performed from the specified command. |`boolean_true` | |`--delete` |Files that exist in the destination but not in the source are deleted during sync. |`boolean_true` | -|`--exclude` |Exclude all files or objects from the command that matches the specified pattern. |`string` | +|`--exclude` |Exclude all files or objects from the command that matches the specified pattern. |List of `string`, multiple_sep: `":"` | ## Authors diff --git a/components/modules/filter/do_filter.qmd b/components/modules/filter/do_filter.qmd index b7abc636..779b4e8f 100644 --- a/components/modules/filter/do_filter.qmd +++ b/components/modules/filter/do_filter.qmd @@ -70,14 +70,14 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Arguments -|Name |Description |Attributes | -|:----------------------|:------------------------------------------------------------|:-----------------------------------------| -|`--input` |Input h5mu file |`file`, required, example: `"input.h5mu"` | -|`--modality` | |`string`, default: `"rna"` | -|`--obs_filter` |Which .obs columns to use to filter the observations by. |`string`, example: `"filter_with_x"` | -|`--var_filter` |Which .var columns to use to filter the observations by. |`string`, example: `"filter_with_x"` | -|`--output` |Output h5mu file. |`file`, example: `"output.h5mu"` | -|`--output_compression` |The compression format to be used on the output h5mu object. |`string`, example: `"gzip"` | +|Name |Description |Attributes | +|:----------------------|:------------------------------------------------------------|:-----------------------------------------------------------------| +|`--input` |Input h5mu file |`file`, required, example: `"input.h5mu"` | +|`--modality` | |`string`, default: `"rna"` | +|`--obs_filter` |Which .obs columns to use to filter the observations by. |List of `string`, example: `"filter_with_x"`, multiple_sep: `":"` | +|`--var_filter` |Which .var columns to use to filter the observations by. |List of `string`, example: `"filter_with_x"`, multiple_sep: `":"` | +|`--output` |Output h5mu file. |`file`, example: `"output.h5mu"` | +|`--output_compression` |The compression format to be used on the output h5mu object. |`string`, example: `"gzip"` | ## Authors diff --git a/components/modules/filter/remove_modality.qmd b/components/modules/filter/remove_modality.qmd index f6aa8cca..aaccac30 100644 --- a/components/modules/filter/remove_modality.qmd +++ b/components/modules/filter/remove_modality.qmd @@ -68,12 +68,12 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Arguments -|Name |Description |Attributes | -|:----------------------|:------------------------------------------------------------|:-----------------------------------------| -|`--input` |Input h5mu file |`file`, required, example: `"input.h5mu"` | -|`--modality` | |`string`, required | -|`--output` |Output h5mu file. |`file`, example: `"output.h5mu"` | -|`--output_compression` |The compression format to be used on the output h5mu object. |`string`, example: `"gzip"` | +|Name |Description |Attributes | +|:----------------------|:------------------------------------------------------------|:-----------------------------------------------| +|`--input` |Input h5mu file |`file`, required, example: `"input.h5mu"` | +|`--modality` | |List of `string`, required, multiple_sep: `":"` | +|`--output` |Output h5mu file. |`file`, example: `"output.h5mu"` | +|`--output_compression` |The compression format to be used on the output h5mu object. |`string`, example: `"gzip"` | ## Authors diff --git a/components/modules/integrate/harmonypy.qmd b/components/modules/integrate/harmonypy.qmd index 073b8f47..2a164381 100644 --- a/components/modules/integrate/harmonypy.qmd +++ b/components/modules/integrate/harmonypy.qmd @@ -72,16 +72,16 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Arguments -|Name |Description |Attributes | -|:----------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------| -|`--input` |Input h5mu file |`file`, required | -|`--output` |Output h5mu file. |`file`, required | -|`--output_compression` |The compression format to be used on the output h5mu object. |`string`, example: `"gzip"` | -|`--modality` | |`string`, default: `"rna"` | -|`--obsm_input` |Which .obsm slot to use as a starting PCA embedding. |`string`, default: `"X_pca"` | -|`--obsm_output` |In which .obsm slot to store the resulting integrated embedding. |`string`, default: `"X_pca_integrated"` | -|`--theta` |Diversity clustering penalty parameter. Specify for each variable in group.by.vars. theta=0 does not encourage any diversity. Larger values of theta result in more diverse clusters. |`double`, default: `2` | -|`--obs_covariates` |The .obs field(s) that define the covariate(s) to regress out. |`string`, required, example: `"batch"`, example: `"sample"` | +|Name |Description |Attributes | +|:----------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------| +|`--input` |Input h5mu file |`file`, required | +|`--output` |Output h5mu file. |`file`, required | +|`--output_compression` |The compression format to be used on the output h5mu object. |`string`, example: `"gzip"` | +|`--modality` | |`string`, default: `"rna"` | +|`--obsm_input` |Which .obsm slot to use as a starting PCA embedding. |`string`, default: `"X_pca"` | +|`--obsm_output` |In which .obsm slot to store the resulting integrated embedding. |`string`, default: `"X_pca_integrated"` | +|`--theta` |Diversity clustering penalty parameter. Specify for each variable in group.by.vars. theta=0 does not encourage any diversity. Larger values of theta result in more diverse clusters. |List of `double`, default: `2`, multiple_sep: `":"` | +|`--obs_covariates` |The .obs field(s) that define the covariate(s) to regress out. |List of `string`, required, example: `"batch", "sample"`, multiple_sep: `":"` | ## Authors diff --git a/components/modules/integrate/scvi.qmd b/components/modules/integrate/scvi.qmd index 4114713c..575c47e7 100644 --- a/components/modules/integrate/scvi.qmd +++ b/components/modules/integrate/scvi.qmd @@ -110,17 +110,17 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Inputs -|Name |Description |Attributes | -|:-----------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------| -|`--input` |Input h5mu file |`file`, required | -|`--modality` | |`string`, default: `"rna"` | -|`--input_layer` |Input layer to use. If None, X is used |`string` | -|`--obs_batch` |Column name discriminating between your batches. |`string`, default: `"sample_id"` | -|`--var_input` |.var column containing highly variable genes. By default, do not subset genes. |`string` | -|`--obs_labels` |Key in adata.obs for label information. Categories will automatically be converted into integer categories and saved to adata.obs['_scvi_labels']. If None, assigns the same label to all the data. |`string` | -|`--obs_size_factor` |Key in adata.obs for size factor information. Instead of using library size as a size factor, the provided size factor column will be used as offset in the mean of the likelihood. Assumed to be on linear scale. |`string` | -|`--obs_categorical_covariate` |Keys in adata.obs that correspond to categorical data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space). Thus, these should not be used for biologically-relevant factors that you do _not_ want to correct for. |`string` | -|`--obs_continuous_covariate` |Keys in adata.obs that correspond to continuous data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space). Thus, these should not be used for biologically-relevant factors that you do _not_ want to correct for. |`string` | +|Name |Description |Attributes | +|:-----------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------| +|`--input` |Input h5mu file |`file`, required | +|`--modality` | |`string`, default: `"rna"` | +|`--input_layer` |Input layer to use. If None, X is used |`string` | +|`--obs_batch` |Column name discriminating between your batches. |`string`, default: `"sample_id"` | +|`--var_input` |.var column containing highly variable genes. By default, do not subset genes. |`string` | +|`--obs_labels` |Key in adata.obs for label information. Categories will automatically be converted into integer categories and saved to adata.obs['_scvi_labels']. If None, assigns the same label to all the data. |`string` | +|`--obs_size_factor` |Key in adata.obs for size factor information. Instead of using library size as a size factor, the provided size factor column will be used as offset in the mean of the likelihood. Assumed to be on linear scale. |`string` | +|`--obs_categorical_covariate` |Keys in adata.obs that correspond to categorical data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space). Thus, these should not be used for biologically-relevant factors that you do _not_ want to correct for. |List of `string`, multiple_sep: `":"` | +|`--obs_continuous_covariate` |Keys in adata.obs that correspond to continuous data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space). Thus, these should not be used for biologically-relevant factors that you do _not_ want to correct for. |List of `string`, multiple_sep: `":"` | ### Outputs diff --git a/components/modules/labels_transfer/knn.qmd b/components/modules/labels_transfer/knn.qmd index 463daf84..bc54b45d 100644 --- a/components/modules/labels_transfer/knn.qmd +++ b/components/modules/labels_transfer/knn.qmd @@ -74,11 +74,11 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Reference dataset arguments -|Name |Description |Attributes | -|:---------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -|`--reference` |The reference data to train classifiers on. |`file`, example: `"https:/zenodo.org/record/6337966/files/HLCA_emb_and_metadata.h5ad"` | -|`--reference_obsm_features` |The `.obsm` key of the embedding to use for the classifier's training. Make sure that embedding was obtained in the same way as the query embedding (e.g. by the same model or preprocessing). |`string`, required, default: `"X_integrated_scanvi"` | -|`--reference_obs_targets` |The `.obs` key of the target labels to tranfer. |`string`, default: `"ann_level_1"`, default: `"ann_level_2"`, default: `"ann_level_3"`, default: `"ann_level_4"`, default: `"ann_level_5"`, default: `"ann_finest_level"` | +|Name |Description |Attributes | +|:---------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------| +|`--reference` |The reference data to train classifiers on. |`file`, example: `"https:/zenodo.org/record/6337966/files/HLCA_emb_and_metadata.h5ad"` | +|`--reference_obsm_features` |The `.obsm` key of the embedding to use for the classifier's training. Make sure that embedding was obtained in the same way as the query embedding (e.g. by the same model or preprocessing). |`string`, required, default: `"X_integrated_scanvi"` | +|`--reference_obs_targets` |The `.obs` key of the target labels to tranfer. |List of `string`, default: `"ann_level_1", "ann_level_2", "ann_level_3", "ann_level_4", "ann_level_5", "ann_finest_level"`, multiple_sep: `","` | ### Outputs @@ -86,8 +86,8 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen |Name |Description |Attributes | |:--------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------| |`--output` |The query data in .h5mu format with predicted labels transfered from the reference. |`file`, required | -|`--output_obs_predictions` |In which `.obs` slots to store the predicted information. If provided, must have the same length as `--reference_obs_targets`. If empty, will default to the `reference_obs_targets` combined with the `"_pred"` suffix. |`string` | -|`--output_obs_uncertainty` |In which `.obs` slots to store the uncertainty of the predictions. If provided, must have the same length as `--reference_obs_targets`. If empty, will default to the `reference_obs_targets` combined with the `"_uncertainty"` suffix. |`string` | +|`--output_obs_predictions` |In which `.obs` slots to store the predicted information. If provided, must have the same length as `--reference_obs_targets`. If empty, will default to the `reference_obs_targets` combined with the `"_pred"` suffix. |List of `string`, multiple_sep: `":"` | +|`--output_obs_uncertainty` |In which `.obs` slots to store the uncertainty of the predictions. If provided, must have the same length as `--reference_obs_targets`. If empty, will default to the `reference_obs_targets` combined with the `"_uncertainty"` suffix. |List of `string`, multiple_sep: `":"` | |`--output_uns_parameters` |The `.uns` key to store additional information about the parameters used for the label transfer. |`string`, default: `"labels_transfer"` | diff --git a/components/modules/labels_transfer/xgboost.qmd b/components/modules/labels_transfer/xgboost.qmd index 1d4984ab..0d9cf99b 100644 --- a/components/modules/labels_transfer/xgboost.qmd +++ b/components/modules/labels_transfer/xgboost.qmd @@ -92,11 +92,11 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Reference dataset arguments -|Name |Description |Attributes | -|:---------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -|`--reference` |The reference data to train classifiers on. |`file`, example: `"https:/zenodo.org/record/6337966/files/HLCA_emb_and_metadata.h5ad"` | -|`--reference_obsm_features` |The `.obsm` key of the embedding to use for the classifier's training. Make sure that embedding was obtained in the same way as the query embedding (e.g. by the same model or preprocessing). |`string`, required, default: `"X_integrated_scanvi"` | -|`--reference_obs_targets` |The `.obs` key of the target labels to tranfer. |`string`, default: `"ann_level_1"`, default: `"ann_level_2"`, default: `"ann_level_3"`, default: `"ann_level_4"`, default: `"ann_level_5"`, default: `"ann_finest_level"` | +|Name |Description |Attributes | +|:---------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------| +|`--reference` |The reference data to train classifiers on. |`file`, example: `"https:/zenodo.org/record/6337966/files/HLCA_emb_and_metadata.h5ad"` | +|`--reference_obsm_features` |The `.obsm` key of the embedding to use for the classifier's training. Make sure that embedding was obtained in the same way as the query embedding (e.g. by the same model or preprocessing). |`string`, required, default: `"X_integrated_scanvi"` | +|`--reference_obs_targets` |The `.obs` key of the target labels to tranfer. |List of `string`, default: `"ann_level_1", "ann_level_2", "ann_level_3", "ann_level_4", "ann_level_5", "ann_finest_level"`, multiple_sep: `","` | ### Outputs @@ -104,8 +104,8 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen |Name |Description |Attributes | |:--------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------| |`--output` |The query data in .h5mu format with predicted labels transfered from the reference. |`file`, required | -|`--output_obs_predictions` |In which `.obs` slots to store the predicted information. If provided, must have the same length as `--reference_obs_targets`. If empty, will default to the `reference_obs_targets` combined with the `"_pred"` suffix. |`string` | -|`--output_obs_uncertainty` |In which `.obs` slots to store the uncertainty of the predictions. If provided, must have the same length as `--reference_obs_targets`. If empty, will default to the `reference_obs_targets` combined with the `"_uncertainty"` suffix. |`string` | +|`--output_obs_predictions` |In which `.obs` slots to store the predicted information. If provided, must have the same length as `--reference_obs_targets`. If empty, will default to the `reference_obs_targets` combined with the `"_pred"` suffix. |List of `string`, multiple_sep: `":"` | +|`--output_obs_uncertainty` |In which `.obs` slots to store the uncertainty of the predictions. If provided, must have the same length as `--reference_obs_targets`. If empty, will default to the `reference_obs_targets` combined with the `"_uncertainty"` suffix. |List of `string`, multiple_sep: `":"` | |`--output_uns_parameters` |The `.uns` key to store additional information about the parameters used for the label transfer. |`string`, default: `"labels_transfer"` | diff --git a/components/modules/mapping/bd_rhapsody.qmd b/components/modules/mapping/bd_rhapsody.qmd index a7b8d730..cd527309 100644 --- a/components/modules/mapping/bd_rhapsody.qmd +++ b/components/modules/mapping/bd_rhapsody.qmd @@ -109,15 +109,15 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Inputs -|Name |Description |Attributes | -|:----------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------| -|`--mode` |Whether to run a whole transcriptome analysis (WTA) or a targeted analysis. |`string`, required, example: `"wta"` | -|`--input` |Path to your read files in the FASTQ.GZ format. You may specify as many R1/R2 read pairs as you want. |`file`, required, example: `"input.fastq.gz"` | -|`--reference` |Refence to map to. For `--mode wta`, this is the path to STAR index as a tar.gz file. For `--mode targeted`, this is the path to mRNA reference file for pre-designed, supplemental, or custom panel, in FASTA format |`file`, required, example: `"reference_genome.tar.gz|reference.fasta"` | -|`--transcriptome_annotation` |Path to GTF annotation file (only for `--mode wta`). |`file`, example: `"transcriptome.gtf"` | -|`--abseq_reference` |Path to the AbSeq reference file in FASTA format. Only needed if BD AbSeq Ab-Oligos are used. |`file`, example: `"abseq_reference.fasta"` | -|`--supplemental_reference` |Path to the supplemental reference file in FASTA format. Only needed if there are additional transgene sequences used in the experiment (only for `--mode wta`). |`file`, example: `"supplemental_reference.fasta"` | -|`--sample_prefix` |Specify a run name to use as the output file base name. Use only letters, numbers, or hyphens. Do not use special characters or spaces. |`string`, default: `"sample"` | +|Name |Description |Attributes | +|:----------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------| +|`--mode` |Whether to run a whole transcriptome analysis (WTA) or a targeted analysis. |`string`, required, example: `"wta"` | +|`--input` |Path to your read files in the FASTQ.GZ format. You may specify as many R1/R2 read pairs as you want. |List of `file`, required, example: `"input.fastq.gz"`, multiple_sep: `";"` | +|`--reference` |Refence to map to. For `--mode wta`, this is the path to STAR index as a tar.gz file. For `--mode targeted`, this is the path to mRNA reference file for pre-designed, supplemental, or custom panel, in FASTA format |List of `file`, required, example: `"reference_genome.tar.gz|reference.fasta"`, multiple_sep: `";"` | +|`--transcriptome_annotation` |Path to GTF annotation file (only for `--mode wta`). |`file`, example: `"transcriptome.gtf"` | +|`--abseq_reference` |Path to the AbSeq reference file in FASTA format. Only needed if BD AbSeq Ab-Oligos are used. |List of `file`, example: `"abseq_reference.fasta"`, multiple_sep: `";"` | +|`--supplemental_reference` |Path to the supplemental reference file in FASTA format. Only needed if there are additional transgene sequences used in the experiment (only for `--mode wta`). |List of `file`, example: `"supplemental_reference.fasta"`, multiple_sep: `";"` | +|`--sample_prefix` |Specify a run name to use as the output file base name. Use only letters, numbers, or hyphens. Do not use special characters or spaces. |`string`, default: `"sample"` | ### Outputs @@ -146,10 +146,10 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Multiplex arguments -|Name |Description |Attributes | -|:-----------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------| -|`--sample_tags_version` |Specify if multiplexed run. |`string`, example: `"human"` | -|`--tag_names` |Tag_Names (optional) - Specify the tag number followed by '-' and the desired sample name to appear in Sample_Tag_Metrics.csv. Do not use the special characters: &, (), [], {}, <>, ?, | |`string`, example: `"4-mySample"`, example: `"9-myOtherSample"`, example: `"6-alsoThisSample"` | +|Name |Description |Attributes | +|:-----------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------| +|`--sample_tags_version` |Specify if multiplexed run. |`string`, example: `"human"` | +|`--tag_names` |Tag_Names (optional) - Specify the tag number followed by '-' and the desired sample name to appear in Sample_Tag_Metrics.csv. Do not use the special characters: &, (), [], {}, <>, ?, | |List of `string`, example: `"4-mySample", "9-myOtherSample", "6-alsoThisSample"`, multiple_sep: `":"` | ### VDJ arguments diff --git a/components/modules/mapping/cellranger_count.qmd b/components/modules/mapping/cellranger_count.qmd index b1ca8468..1b7f81b3 100644 --- a/components/modules/mapping/cellranger_count.qmd +++ b/components/modules/mapping/cellranger_count.qmd @@ -76,10 +76,10 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Inputs -|Name |Description |Attributes | -|:-------------|:--------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------| -|`--input` |The fastq.gz files to align. Can also be a single directory containing fastq.gz files. |`file`, required, example: `"sample_S1_L001_R1_001.fastq.gz"`, example: `"sample_S1_L001_R2_001.fastq.gz"` | -|`--reference` |The path to Cell Ranger reference tar.gz file. Can also be a directory. |`file`, required, example: `"reference.tar.gz"` | +|Name |Description |Attributes | +|:-------------|:--------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------| +|`--input` |The fastq.gz files to align. Can also be a single directory containing fastq.gz files. |List of `file`, required, example: `"sample_S1_L001_R1_001.fastq.gz", "sample_S1_L001_R2_001.fastq.gz"`, multiple_sep: `";"` | +|`--reference` |The path to Cell Ranger reference tar.gz file. Can also be a directory. |`file`, required, example: `"reference.tar.gz"` | ### Outputs diff --git a/components/modules/mapping/cellranger_multi.qmd b/components/modules/mapping/cellranger_multi.qmd index 57e65fc8..504b54ae 100644 --- a/components/modules/mapping/cellranger_multi.qmd +++ b/components/modules/mapping/cellranger_multi.qmd @@ -93,23 +93,23 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Input files -|Name |Description |Attributes | -|:--------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------| -|`--input` |The FASTQ files to be analyzed. FASTQ files should conform to the naming conventions of bcl2fastq and mkfastq: `[Sample Name]_S[Sample Index]_L00[Lane Number]_[Read Type]_001.fastq.gz` |`file`, required, example: `"mysample_S1_L001_R1_001.fastq.gz"`, example: `"mysample_S1_L001_R2_001.fastq.gz"` | -|`--gex_reference` |Genome refence index built by Cell Ranger mkref. |`file`, required, example: `"reference_genome.tar.gz"` | -|`--vdj_reference` |VDJ refence index built by Cell Ranger mkref. |`file`, example: `"reference_vdj.tar.gz"` | -|`--vdj_inner_enrichment_primers` |V(D)J Immune Profiling libraries: if inner enrichment primers other than those provided in the 10x Genomics kits are used, they need to be specified here as a text file with one primer per line. |`file`, example: `"enrichment_primers.txt"` | -|`--feature_reference` |Path to the Feature reference CSV file, declaring Feature Barcode constructs and associated barcodes. Required only for Antibody Capture or CRISPR Guide Capture libraries. See https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/feature-bc-analysis#feature-ref for more information. |`file`, example: `"feature_reference.csv"` | +|Name |Description |Attributes | +|:--------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------| +|`--input` |The FASTQ files to be analyzed. FASTQ files should conform to the naming conventions of bcl2fastq and mkfastq: `[Sample Name]_S[Sample Index]_L00[Lane Number]_[Read Type]_001.fastq.gz` |List of `file`, required, example: `"mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz"`, multiple_sep: `";"` | +|`--gex_reference` |Genome refence index built by Cell Ranger mkref. |`file`, required, example: `"reference_genome.tar.gz"` | +|`--vdj_reference` |VDJ refence index built by Cell Ranger mkref. |`file`, example: `"reference_vdj.tar.gz"` | +|`--vdj_inner_enrichment_primers` |V(D)J Immune Profiling libraries: if inner enrichment primers other than those provided in the 10x Genomics kits are used, they need to be specified here as a text file with one primer per line. |`file`, example: `"enrichment_primers.txt"` | +|`--feature_reference` |Path to the Feature reference CSV file, declaring Feature Barcode constructs and associated barcodes. Required only for Antibody Capture or CRISPR Guide Capture libraries. See https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/feature-bc-analysis#feature-ref for more information. |`file`, example: `"feature_reference.csv"` | ### Library arguments -|Name |Description |Attributes | -|:---------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------| -|`--library_id` |The Illumina sample name to analyze. This must exactly match the 'Sample Name' part of the FASTQ files specified in the `--input` argument. |`string`, required, example: `"mysample1"` | -|`--library_type` |The underlying feature type of the library. Possible values: "Gene Expression", "VDJ", "VDJ-T", "VDJ-B", "Antibody Capture", "CRISPR Guide Capture", "Multiplexing Capture" |`string`, required, example: `"Gene Expression"` | -|`--library_subsample` |Optional. The rate at which reads from the provided FASTQ files are sampled. Must be strictly greater than 0 and less than or equal to 1. |`string`, example: `"0.5"` | -|`--library_lanes` |Lanes associated with this sample. Defaults to using all lanes. |`string`, example: `"1-4"` | +|Name |Description |Attributes | +|:---------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------| +|`--library_id` |The Illumina sample name to analyze. This must exactly match the 'Sample Name' part of the FASTQ files specified in the `--input` argument. |List of `string`, required, example: `"mysample1"`, multiple_sep: `";"` | +|`--library_type` |The underlying feature type of the library. Possible values: "Gene Expression", "VDJ", "VDJ-T", "VDJ-B", "Antibody Capture", "CRISPR Guide Capture", "Multiplexing Capture" |List of `string`, required, example: `"Gene Expression"`, multiple_sep: `";"` | +|`--library_subsample` |Optional. The rate at which reads from the provided FASTQ files are sampled. Must be strictly greater than 0 and less than or equal to 1. |List of `string`, example: `"0.5"`, multiple_sep: `";"` | +|`--library_lanes` |Lanes associated with this sample. Defaults to using all lanes. |List of `string`, example: `"1-4"`, multiple_sep: `";"` | ### Gene expression arguments diff --git a/components/modules/mapping/htseq_count.qmd b/components/modules/mapping/htseq_count.qmd index 99fb1c20..58a7f2bc 100644 --- a/components/modules/mapping/htseq_count.qmd +++ b/components/modules/mapping/htseq_count.qmd @@ -89,38 +89,38 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Input -|Name |Description |Attributes | -|:-------------|:------------------------------------------------------|:------------------------------------------------------------------------| -|`--input` |Path to the SAM/BAM files containing the mapped reads. |`file`, required, example: `"mysample1.BAM"`, example: `"mysample2.BAM"` | -|`--reference` |Path to the GTF file containing the features. |`file`, required, example: `"reference.gtf"` | +|Name |Description |Attributes | +|:-------------|:------------------------------------------------------|:------------------------------------------------------------------------------------------| +|`--input` |Path to the SAM/BAM files containing the mapped reads. |List of `file`, required, example: `"mysample1.BAM", "mysample2.BAM"`, multiple_sep: `";"` | +|`--reference` |Path to the GTF file containing the features. |`file`, required, example: `"reference.gtf"` | ### Output -|Name |Description |Attributes | -|:---------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------| -|`--output` |Filename to output the counts to. |`file`, required, example: `"htseq-count.tsv"` | -|`--output_delimiter` |Column delimiter in output. |`string`, example: `" "` | -|`--output_sam` |Write out all SAM alignment records into SAM/BAM files (one per input file needed), annotating each line with its feature assignment (as an optional field with tag 'XF'). See the -p option to use BAM instead of SAM. |`file`, example: `"mysample1_out.BAM"`, example: `"mysample2_out.BAM"` | -|`--output_sam_format` |Format to use with the --output_sam argument. |`string` | +|Name |Description |Attributes | +|:---------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------| +|`--output` |Filename to output the counts to. |`file`, required, example: `"htseq-count.tsv"` | +|`--output_delimiter` |Column delimiter in output. |`string`, example: `" "` | +|`--output_sam` |Write out all SAM alignment records into SAM/BAM files (one per input file needed), annotating each line with its feature assignment (as an optional field with tag 'XF'). See the -p option to use BAM instead of SAM. |List of `file`, example: `"mysample1_out.BAM", "mysample2_out.BAM"`, multiple_sep: `";"` | +|`--output_sam_format` |Format to use with the --output_sam argument. |`string` | ### Arguments -|Name |Description |Attributes | -|:-----------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------| -|`--order` |Sorting order of . Paired-end sequencing data must be sorted either by position or by read name, and the sorting order must be specified. Ignored for single-end data. |`string`, default: `"name"` | -|`--stranded` |Whether the data is from a strand-specific assay. 'reverse' means 'yes' with reversed strand interpretation. |`string`, default: `"yes"` | -|`--minimum_alignment_quality` |Skip all reads with MAPQ alignment quality lower than the given minimum value. MAPQ is the 5th column of a SAM/BAM file and its usage depends on the software used to map the reads. |`integer`, default: `10` | -|`--type` |Feature type (3rd column in GTF file) to be used, all features of other type are ignored (default, suitable for Ensembl GTF files: exon) |`string`, example: `"exon"` | -|`--id_attribute` |GTF attribute to be used as feature ID (default, suitable for Ensembl GTF files: gene_id). All feature of the right type (see -t option) within the same GTF attribute will be added together. The typical way of using this option is to count all exonic reads from each gene and add the exons but other uses are possible as well. You can call this option multiple times: in that case, the combination of all attributes separated by colons (:) will be used as a unique identifier, e.g. for exons you might use -i gene_id -i exon_number. |`string`, example: `"gene_id"` | -|`--additional_attributes` |Additional feature attributes (suitable for Ensembl GTF files: gene_name). Use multiple times for more than one additional attribute. These attributes are only used as annotations in the output, while the determination of how the counts are added together is done based on option -i. |`string`, example: `"gene_name"` | -|`--add_chromosome_info` |Store information about the chromosome of each feature as an additional attribute (e.g. colunm in the TSV output file). |`boolean_true` | -|`--mode` |Mode to handle reads overlapping more than one feature. |`string`, default: `"union"` | -|`--non_unique` |Whether and how to score reads that are not uniquely aligned or ambiguously assigned to features. |`string`, default: `"none"` | -|`--secondary_alignments` |Whether to score secondary alignments (0x100 flag). |`string` | -|`--supplementary_alignments` |Whether to score supplementary alignments (0x800 flag). |`string` | -|`--counts_output_sparse` |Store the counts as a sparse matrix (mtx, h5ad, loom). |`boolean_true` | +|Name |Description |Attributes | +|:-----------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------| +|`--order` |Sorting order of . Paired-end sequencing data must be sorted either by position or by read name, and the sorting order must be specified. Ignored for single-end data. |`string`, default: `"name"` | +|`--stranded` |Whether the data is from a strand-specific assay. 'reverse' means 'yes' with reversed strand interpretation. |`string`, default: `"yes"` | +|`--minimum_alignment_quality` |Skip all reads with MAPQ alignment quality lower than the given minimum value. MAPQ is the 5th column of a SAM/BAM file and its usage depends on the software used to map the reads. |`integer`, default: `10` | +|`--type` |Feature type (3rd column in GTF file) to be used, all features of other type are ignored (default, suitable for Ensembl GTF files: exon) |`string`, example: `"exon"` | +|`--id_attribute` |GTF attribute to be used as feature ID (default, suitable for Ensembl GTF files: gene_id). All feature of the right type (see -t option) within the same GTF attribute will be added together. The typical way of using this option is to count all exonic reads from each gene and add the exons but other uses are possible as well. You can call this option multiple times: in that case, the combination of all attributes separated by colons (:) will be used as a unique identifier, e.g. for exons you might use -i gene_id -i exon_number. |List of `string`, example: `"gene_id"`, multiple_sep: `":"` | +|`--additional_attributes` |Additional feature attributes (suitable for Ensembl GTF files: gene_name). Use multiple times for more than one additional attribute. These attributes are only used as annotations in the output, while the determination of how the counts are added together is done based on option -i. |List of `string`, example: `"gene_name"`, multiple_sep: `":"` | +|`--add_chromosome_info` |Store information about the chromosome of each feature as an additional attribute (e.g. colunm in the TSV output file). |`boolean_true` | +|`--mode` |Mode to handle reads overlapping more than one feature. |`string`, default: `"union"` | +|`--non_unique` |Whether and how to score reads that are not uniquely aligned or ambiguously assigned to features. |`string`, default: `"none"` | +|`--secondary_alignments` |Whether to score secondary alignments (0x100 flag). |`string` | +|`--supplementary_alignments` |Whether to score supplementary alignments (0x800 flag). |`string` | +|`--counts_output_sparse` |Store the counts as a sparse matrix (mtx, h5ad, loom). |`boolean_true` | ## Authors diff --git a/components/modules/mapping/htseq_count_to_h5mu.qmd b/components/modules/mapping/htseq_count_to_h5mu.qmd index 9ad52a51..adea5ae5 100644 --- a/components/modules/mapping/htseq_count_to_h5mu.qmd +++ b/components/modules/mapping/htseq_count_to_h5mu.qmd @@ -71,11 +71,11 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Input -|Name |Description |Attributes | -|:----------------|:--------------------------------------------|:-----------------------------------------------| -|`--input_id` |The obs index for the counts |`string`, required, example: `"foo"` | -|`--input_counts` |The counts as a TSV file as output by HTSeq. |`file`, required, example: `"counts.tsv"` | -|`--reference` |The GTF file. |`file`, required, example: `"gencode_v41_star"` | +|Name |Description |Attributes | +|:----------------|:--------------------------------------------|:----------------------------------------------------------------------| +|`--input_id` |The obs index for the counts |List of `string`, required, example: `"foo"`, multiple_sep: `";"` | +|`--input_counts` |The counts as a TSV file as output by HTSeq. |List of `file`, required, example: `"counts.tsv"`, multiple_sep: `";"` | +|`--reference` |The GTF file. |`file`, required, example: `"gencode_v41_star"` | ### Outputs diff --git a/components/modules/mapping/multi_star.qmd b/components/modules/mapping/multi_star.qmd index a3c52550..0f601dfa 100644 --- a/components/modules/mapping/multi_star.qmd +++ b/components/modules/mapping/multi_star.qmd @@ -75,14 +75,14 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Input/Output -|Name |Description |Attributes | -|:-------------------|:-----------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------| -|`--input_id` |The ID of the sample being processed. This vector should have the same length as the `--input_r1` argument. |`string`, required, example: `"mysample"`, example: `"mysample"` | -|`--input_r1` |Paths to the sequences to be mapped. If using Illumina paired-end reads, only the R1 files should be passed. |`file`, required, example: `"mysample_S1_L001_R1_001.fastq.gz"`, example: `"mysample_S1_L002_R1_001.fastq.gz"` | -|`--input_r2` |Paths to the sequences to be mapped. If using Illumina paired-end reads, only the R2 files should be passed. |`file`, example: `"mysample_S1_L001_R2_001.fastq.gz"`, example: `"mysample_S1_L002_R2_001.fastq.gz"` | -|`--reference_index` |Path to the reference built by star_build_reference. Corresponds to the --genomeDir argument in the STAR command. |`file`, required, example: `"/path/to/reference"` | -|`--reference_gtf` |Path to the gtf reference file. |`file`, required, example: `"genes.gtf"` | -|`--output` |Path to output directory. Corresponds to the --outFileNamePrefix argument in the STAR command. |`file`, required, example: `"/path/to/foo"` | +|Name |Description |Attributes | +|:-------------------|:-----------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------| +|`--input_id` |The ID of the sample being processed. This vector should have the same length as the `--input_r1` argument. |List of `string`, required, example: `"mysample", "mysample"`, multiple_sep: `";"` | +|`--input_r1` |Paths to the sequences to be mapped. If using Illumina paired-end reads, only the R1 files should be passed. |List of `file`, required, example: `"mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L002_R1_001.fastq.gz"`, multiple_sep: `";"` | +|`--input_r2` |Paths to the sequences to be mapped. If using Illumina paired-end reads, only the R2 files should be passed. |List of `file`, example: `"mysample_S1_L001_R2_001.fastq.gz", "mysample_S1_L002_R2_001.fastq.gz"`, multiple_sep: `";"` | +|`--reference_index` |Path to the reference built by star_build_reference. Corresponds to the --genomeDir argument in the STAR command. |`file`, required, example: `"/path/to/reference"` | +|`--reference_gtf` |Path to the gtf reference file. |`file`, required, example: `"genes.gtf"` | +|`--output` |Path to output directory. Corresponds to the --outFileNamePrefix argument in the STAR command. |`file`, required, example: `"/path/to/foo"` | ### Processing arguments @@ -103,26 +103,26 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Genome Parameters -|Name |Description |Attributes | -|:--------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------| -|`--genomeFastaFiles` |path(s) to the fasta files with the genome sequences, separated by spaces. These files should be plain text FASTA files, they *cannot* be zipped. Required for the genome generation (--runMode genomeGenerate). Can also be used in the mapping (--runMode alignReads) to add extra (new) sequences to the genome (e.g. spike-ins). |`file` | +|Name |Description |Attributes | +|:--------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------| +|`--genomeFastaFiles` |path(s) to the fasta files with the genome sequences, separated by spaces. These files should be plain text FASTA files, they *cannot* be zipped. Required for the genome generation (--runMode genomeGenerate). Can also be used in the mapping (--runMode alignReads) to add extra (new) sequences to the genome (e.g. spike-ins). |List of `file`, multiple_sep: `";"` | ### Splice Junctions Database -|Name |Description |Attributes | -|:----------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------| -|`--sjdbFileChrStartEnd` |path to the files with genomic coordinates (chr start end strand) for the splice junction introns. Multiple files can be supplied and will be concatenated. |`string` | -|`--sjdbGTFfile` |path to the GTF file with annotations |`file` | -|`--sjdbGTFchrPrefix` |prefix for chromosome names in a GTF file (e.g. 'chr' for using ENSMEBL annotations with UCSC genomes) |`string` | -|`--sjdbGTFfeatureExon` |feature type in GTF file to be used as exons for building transcripts |`string`, example: `"exon"` | -|`--sjdbGTFtagExonParentTranscript` |GTF attribute name for parent transcript ID (default "transcript_id" works for GTF files) |`string`, example: `"transcript_id"` | -|`--sjdbGTFtagExonParentGene` |GTF attribute name for parent gene ID (default "gene_id" works for GTF files) |`string`, example: `"gene_id"` | -|`--sjdbGTFtagExonParentGeneName` |GTF attribute name for parent gene name |`string`, example: `"gene_name"` | -|`--sjdbGTFtagExonParentGeneType` |GTF attribute name for parent gene type |`string`, example: `"gene_type"`, example: `"gene_biotype"` | -|`--sjdbOverhang` |length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1) |`integer`, example: `100` | -|`--sjdbScore` |extra alignment score for alignments that cross database junctions |`integer`, example: `2` | -|`--sjdbInsertSave` |which files to save when sjdb junctions are inserted on the fly at the mapping step - Basic ... only small junction / transcript files - All ... all files including big Genome, SA and SAindex - this will create a complete genome directory |`string`, example: `"Basic"` | +|Name |Description |Attributes | +|:----------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------| +|`--sjdbFileChrStartEnd` |path to the files with genomic coordinates (chr start end strand) for the splice junction introns. Multiple files can be supplied and will be concatenated. |List of `string`, multiple_sep: `";"` | +|`--sjdbGTFfile` |path to the GTF file with annotations |`file` | +|`--sjdbGTFchrPrefix` |prefix for chromosome names in a GTF file (e.g. 'chr' for using ENSMEBL annotations with UCSC genomes) |`string` | +|`--sjdbGTFfeatureExon` |feature type in GTF file to be used as exons for building transcripts |`string`, example: `"exon"` | +|`--sjdbGTFtagExonParentTranscript` |GTF attribute name for parent transcript ID (default "transcript_id" works for GTF files) |`string`, example: `"transcript_id"` | +|`--sjdbGTFtagExonParentGene` |GTF attribute name for parent gene ID (default "gene_id" works for GTF files) |`string`, example: `"gene_id"` | +|`--sjdbGTFtagExonParentGeneName` |GTF attribute name for parent gene name |List of `string`, example: `"gene_name"`, multiple_sep: `";"` | +|`--sjdbGTFtagExonParentGeneType` |GTF attribute name for parent gene type |List of `string`, example: `"gene_type", "gene_biotype"`, multiple_sep: `";"` | +|`--sjdbOverhang` |length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1) |`integer`, example: `100` | +|`--sjdbScore` |extra alignment score for alignments that cross database junctions |`integer`, example: `2` | +|`--sjdbInsertSave` |which files to save when sjdb junctions are inserted on the fly at the mapping step - Basic ... only small junction / transcript files - All ... all files including big Genome, SA and SAindex - this will create a complete genome directory |`string`, example: `"Basic"` | ### Variation parameters @@ -134,43 +134,43 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Read Parameters -|Name |Description |Attributes | -|:------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------| -|`--readFilesType` |format of input read files - Fastx ... FASTA or FASTQ - SAM SE ... SAM or BAM single-end reads; for BAM use --readFilesCommand samtools view - SAM PE ... SAM or BAM paired-end reads; for BAM use --readFilesCommand samtools view |`string`, example: `"Fastx"` | -|`--readFilesSAMattrKeep` |for --readFilesType SAM SE/PE, which SAM tags to keep in the output BAM, e.g.: --readFilesSAMtagsKeep RG PL - All ... keep all tags - None ... do not keep any tags |`string`, example: `"All"` | -|`--readFilesManifest` |path to the "manifest" file with the names of read files. The manifest file should contain 3 tab-separated columns: paired-end reads: read1_file_name $tab$ read2_file_name $tab$ read_group_line. single-end reads: read1_file_name $tab$ - $tab$ read_group_line. Spaces, but not tabs are allowed in file names. If read_group_line does not start with ID:, it can only contain one ID field, and ID: will be added to it. If read_group_line starts with ID:, it can contain several fields separated by $tab$, and all fields will be be copied verbatim into SAM @RG header line. |`file` | -|`--readFilesPrefix` |prefix for the read files names, i.e. it will be added in front of the strings in --readFilesIn |`string` | -|`--readFilesCommand` |command line to execute for each of the input file. This command should generate FASTA or FASTQ text and send it to stdout For example: zcat - to uncompress .gz files, bzcat - to uncompress .bz2 files, etc. |`string` | -|`--readMapNumber` |number of reads to map from the beginning of the file -1: map all reads |`integer`, example: `-1` | -|`--readMatesLengthsIn` |Equal/NotEqual - lengths of names,sequences,qualities for both mates are the same / not the same. NotEqual is safe in all situations. |`string`, example: `"NotEqual"` | -|`--readNameSeparator` |character(s) separating the part of the read names that will be trimmed in output (read name after space is always trimmed) |`string`, example: `"/"` | -|`--readQualityScoreBase` |number to be subtracted from the ASCII code to get Phred quality score |`integer`, example: `33` | +|Name |Description |Attributes | +|:------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------| +|`--readFilesType` |format of input read files - Fastx ... FASTA or FASTQ - SAM SE ... SAM or BAM single-end reads; for BAM use --readFilesCommand samtools view - SAM PE ... SAM or BAM paired-end reads; for BAM use --readFilesCommand samtools view |`string`, example: `"Fastx"` | +|`--readFilesSAMattrKeep` |for --readFilesType SAM SE/PE, which SAM tags to keep in the output BAM, e.g.: --readFilesSAMtagsKeep RG PL - All ... keep all tags - None ... do not keep any tags |List of `string`, example: `"All"`, multiple_sep: `";"` | +|`--readFilesManifest` |path to the "manifest" file with the names of read files. The manifest file should contain 3 tab-separated columns: paired-end reads: read1_file_name $tab$ read2_file_name $tab$ read_group_line. single-end reads: read1_file_name $tab$ - $tab$ read_group_line. Spaces, but not tabs are allowed in file names. If read_group_line does not start with ID:, it can only contain one ID field, and ID: will be added to it. If read_group_line starts with ID:, it can contain several fields separated by $tab$, and all fields will be be copied verbatim into SAM @RG header line. |`file` | +|`--readFilesPrefix` |prefix for the read files names, i.e. it will be added in front of the strings in --readFilesIn |`string` | +|`--readFilesCommand` |command line to execute for each of the input file. This command should generate FASTA or FASTQ text and send it to stdout For example: zcat - to uncompress .gz files, bzcat - to uncompress .bz2 files, etc. |List of `string`, multiple_sep: `";"` | +|`--readMapNumber` |number of reads to map from the beginning of the file -1: map all reads |`integer`, example: `-1` | +|`--readMatesLengthsIn` |Equal/NotEqual - lengths of names,sequences,qualities for both mates are the same / not the same. NotEqual is safe in all situations. |`string`, example: `"NotEqual"` | +|`--readNameSeparator` |character(s) separating the part of the read names that will be trimmed in output (read name after space is always trimmed) |List of `string`, example: `"/"`, multiple_sep: `";"` | +|`--readQualityScoreBase` |number to be subtracted from the ASCII code to get Phred quality score |`integer`, example: `33` | ### Read Clipping -|Name |Description |Attributes | -|:----------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------| -|`--clipAdapterType` |adapter clipping type - Hamming ... adapter clipping based on Hamming distance, with the number of mismatches controlled by --clip5pAdapterMMp - CellRanger4 ... 5p and 3p adapter clipping similar to CellRanger4. Utilizes Opal package by Martin Sosic: https://github.com/Martinsos/opal - None ... no adapter clipping, all other clip* parameters are disregarded |`string`, example: `"Hamming"` | -|`--clip3pNbases` |number(s) of bases to clip from 3p of each mate. If one value is given, it will be assumed the same for both mates. |`integer`, example: `0` | -|`--clip3pAdapterSeq` |adapter sequences to clip from 3p of each mate. If one value is given, it will be assumed the same for both mates. - polyA ... polyA sequence with the length equal to read length |`string` | -|`--clip3pAdapterMMp` |max proportion of mismatches for 3p adapter clipping for each mate. If one value is given, it will be assumed the same for both mates. |`double`, example: `0.1` | -|`--clip3pAfterAdapterNbases` |number of bases to clip from 3p of each mate after the adapter clipping. If one value is given, it will be assumed the same for both mates. |`integer`, example: `0` | -|`--clip5pNbases` |number(s) of bases to clip from 5p of each mate. If one value is given, it will be assumed the same for both mates. |`integer`, example: `0` | +|Name |Description |Attributes | +|:----------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------| +|`--clipAdapterType` |adapter clipping type - Hamming ... adapter clipping based on Hamming distance, with the number of mismatches controlled by --clip5pAdapterMMp - CellRanger4 ... 5p and 3p adapter clipping similar to CellRanger4. Utilizes Opal package by Martin Sosic: https://github.com/Martinsos/opal - None ... no adapter clipping, all other clip* parameters are disregarded |`string`, example: `"Hamming"` | +|`--clip3pNbases` |number(s) of bases to clip from 3p of each mate. If one value is given, it will be assumed the same for both mates. |List of `integer`, example: `0`, multiple_sep: `";"` | +|`--clip3pAdapterSeq` |adapter sequences to clip from 3p of each mate. If one value is given, it will be assumed the same for both mates. - polyA ... polyA sequence with the length equal to read length |List of `string`, multiple_sep: `";"` | +|`--clip3pAdapterMMp` |max proportion of mismatches for 3p adapter clipping for each mate. If one value is given, it will be assumed the same for both mates. |List of `double`, example: `0.1`, multiple_sep: `";"` | +|`--clip3pAfterAdapterNbases` |number of bases to clip from 3p of each mate after the adapter clipping. If one value is given, it will be assumed the same for both mates. |List of `integer`, example: `0`, multiple_sep: `";"` | +|`--clip5pNbases` |number(s) of bases to clip from 5p of each mate. If one value is given, it will be assumed the same for both mates. |List of `integer`, example: `0`, multiple_sep: `";"` | ### Limits -|Name |Description |Attributes | -|:---------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------| -|`--limitGenomeGenerateRAM` |maximum available RAM (bytes) for genome generation |`long`, example: `NA` | -|`--limitIObufferSize` |max available buffers size (bytes) for input/output, per thread |`long`, example: `30000000`, example: `50000000` | -|`--limitOutSAMoneReadBytes` |max size of the SAM record (bytes) for one read. Recommended value: >(2*(LengthMate1+LengthMate2+100)*outFilterMultimapNmax |`long`, example: `100000` | -|`--limitOutSJoneRead` |max number of junctions for one read (including all multi-mappers) |`integer`, example: `1000` | -|`--limitOutSJcollapsed` |max number of collapsed junctions |`integer`, example: `1000000` | -|`--limitBAMsortRAM` |maximum available RAM (bytes) for sorting BAM. If =0, it will be set to the genome index size. 0 value can only be used with --genomeLoad NoSharedMemory option. |`long`, example: `0` | -|`--limitSjdbInsertNsj` |maximum number of junctions to be inserted to the genome on the fly at the mapping stage, including those from annotations and those detected in the 1st step of the 2-pass run |`integer`, example: `1000000` | -|`--limitNreadsSoft` |soft limit on the number of reads |`integer`, example: `-1` | +|Name |Description |Attributes | +|:---------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------| +|`--limitGenomeGenerateRAM` |maximum available RAM (bytes) for genome generation |`long`, example: `NA` | +|`--limitIObufferSize` |max available buffers size (bytes) for input/output, per thread |List of `long`, example: `30000000, 50000000`, multiple_sep: `";"` | +|`--limitOutSAMoneReadBytes` |max size of the SAM record (bytes) for one read. Recommended value: >(2*(LengthMate1+LengthMate2+100)*outFilterMultimapNmax |`long`, example: `100000` | +|`--limitOutSJoneRead` |max number of junctions for one read (including all multi-mappers) |`integer`, example: `1000` | +|`--limitOutSJcollapsed` |max number of collapsed junctions |`integer`, example: `1000000` | +|`--limitBAMsortRAM` |maximum available RAM (bytes) for sorting BAM. If =0, it will be set to the genome index size. 0 value can only be used with --genomeLoad NoSharedMemory option. |`long`, example: `0` | +|`--limitSjdbInsertNsj` |maximum number of junctions to be inserted to the genome on the fly at the mapping stage, including those from annotations and those detected in the 1st step of the 2-pass run |`integer`, example: `1000000` | +|`--limitNreadsSoft` |soft limit on the number of reads |`integer`, example: `-1` | ### Output: general @@ -186,29 +186,29 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Output: SAM and BAM -|Name |Description |Attributes | -|:---------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------| -|`--outSAMmode` |mode of SAM output - None ... no SAM output - Full ... full SAM output - NoQS ... full SAM but without quality scores |`string`, example: `"Full"` | -|`--outSAMstrandField` |Cufflinks-like strand field flag - None ... not used - intronMotif ... strand derived from the intron motif. This option changes the output alignments: reads with inconsistent and/or non-canonical introns are filtered out. |`string` | -|`--outSAMattributes` |a string of desired SAM attributes, in the order desired for the output SAM. Tags can be listed in any combination/order. ***Presets: - None ... no attributes - Standard ... NH HI AS nM - All ... NH HI AS nM NM MD jM jI MC ch ***Alignment: - NH ... number of loci the reads maps to: =1 for unique mappers, >1 for multimappers. Standard SAM tag. - HI ... multiple alignment index, starts with --outSAMattrIHstart (=1 by default). Standard SAM tag. - AS ... local alignment score, +1/-1 for matches/mismateches, score* penalties for indels and gaps. For PE reads, total score for two mates. Stadnard SAM tag. - nM ... number of mismatches. For PE reads, sum over two mates. - NM ... edit distance to the reference (number of mismatched + inserted + deleted bases) for each mate. Standard SAM tag. - MD ... string encoding mismatched and deleted reference bases (see standard SAM specifications). Standard SAM tag. - jM ... intron motifs for all junctions (i.e. N in CIGAR): 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT. If splice junctions database is used, and a junction is annotated, 20 is added to its motif value. - jI ... start and end of introns for all junctions (1-based). - XS ... alignment strand according to --outSAMstrandField. - MC ... mate's CIGAR string. Standard SAM tag. - ch ... marks all segment of all chimeric alingments for --chimOutType WithinBAM output. - cN ... number of bases clipped from the read ends: 5' and 3' ***Variation: - vA ... variant allele - vG ... genomic coordinate of the variant overlapped by the read. - vW ... 1 - alignment passes WASP filtering; 2,3,4,5,6,7 - alignment does not pass WASP filtering. Requires --waspOutputMode SAMtag. ***STARsolo: - CR CY UR UY ... sequences and quality scores of cell barcodes and UMIs for the solo* demultiplexing. - GX GN ... gene ID and gene name for unique-gene reads. - gx gn ... gene IDs and gene names for unique- and multi-gene reads. - CB UB ... error-corrected cell barcodes and UMIs for solo* demultiplexing. Requires --outSAMtype BAM SortedByCoordinate. - sM ... assessment of CB and UMI. - sS ... sequence of the entire barcode (CB,UMI,adapter). - sQ ... quality of the entire barcode. ***Unsupported/undocumented: - ha ... haplotype (1/2) when mapping to the diploid genome. Requires genome generated with --genomeTransformType Diploid . - rB ... alignment block read/genomic coordinates. - vR ... read coordinate of the variant. |`string`, example: `"Standard"` | -|`--outSAMattrIHstart` |start value for the IH attribute. 0 may be required by some downstream software, such as Cufflinks or StringTie. |`integer`, example: `1` | -|`--outSAMunmapped` |output of unmapped reads in the SAM format 1st word: - None ... no output - Within ... output unmapped reads within the main SAM file (i.e. Aligned.out.sam) 2nd word: - KeepPairs ... record unmapped mate for each alignment, and, in case of unsorted output, keep it adjacent to its mapped mate. Only affects multi-mapping reads. |`string` | -|`--outSAMorder` |type of sorting for the SAM output Paired: one mate after the other for all paired alignments PairedKeepInputOrder: one mate after the other for all paired alignments, the order is kept the same as in the input FASTQ files |`string`, example: `"Paired"` | -|`--outSAMprimaryFlag` |which alignments are considered primary - all others will be marked with 0x100 bit in the FLAG - OneBestScore ... only one alignment with the best score is primary - AllBestScore ... all alignments with the best score are primary |`string`, example: `"OneBestScore"` | -|`--outSAMreadID` |read ID record type - Standard ... first word (until space) from the FASTx read ID line, removing /1,/2 from the end - Number ... read number (index) in the FASTx file |`string`, example: `"Standard"` | -|`--outSAMmapqUnique` |0 to 255: the MAPQ value for unique mappers |`integer`, example: `255` | -|`--outSAMflagOR` |0 to 65535: sam FLAG will be bitwise OR'd with this value, i.e. FLAG=FLAG | outSAMflagOR. This is applied after all flags have been set by STAR, and after outSAMflagAND. Can be used to set specific bits that are not set otherwise. |`integer`, example: `0` | -|`--outSAMflagAND` |0 to 65535: sam FLAG will be bitwise AND'd with this value, i.e. FLAG=FLAG & outSAMflagOR. This is applied after all flags have been set by STAR, but before outSAMflagOR. Can be used to unset specific bits that are not set otherwise. |`integer`, example: `65535` | -|`--outSAMattrRGline` |SAM/BAM read group line. The first word contains the read group identifier and must start with "ID:", e.g. --outSAMattrRGline ID:xxx CN:yy "DS:z z z". xxx will be added as RG tag to each output alignment. Any spaces in the tag values have to be double quoted. Comma separated RG lines correspons to different (comma separated) input files in --readFilesIn. Commas have to be surrounded by spaces, e.g. --outSAMattrRGline ID:xxx , ID:zzz "DS:z z" , ID:yyy DS:yyyy |`string` | -|`--outSAMheaderHD` |@HD (header) line of the SAM header |`string` | -|`--outSAMheaderPG` |extra @PG (software) line of the SAM header (in addition to STAR) |`string` | -|`--outSAMheaderCommentFile` |path to the file with @CO (comment) lines of the SAM header |`string` | -|`--outSAMfilter` |filter the output into main SAM/BAM files - KeepOnlyAddedReferences ... only keep the reads for which all alignments are to the extra reference sequences added with --genomeFastaFiles at the mapping stage. - KeepAllAddedReferences ... keep all alignments to the extra reference sequences added with --genomeFastaFiles at the mapping stage. |`string` | -|`--outSAMmultNmax` |max number of multiple alignments for a read that will be output to the SAM/BAM files. Note that if this value is not equal to -1, the top scoring alignment will be output first - -1 ... all alignments (up to --outFilterMultimapNmax) will be output |`integer`, example: `-1` | -|`--outSAMtlen` |calculation method for the TLEN field in the SAM/BAM files - 1 ... leftmost base of the (+)strand mate to rightmost base of the (-)mate. (+)sign for the (+)strand mate - 2 ... leftmost base of any mate to rightmost base of any mate. (+)sign for the mate with the leftmost base. This is different from 1 for overlapping mates with protruding ends |`integer`, example: `1` | -|`--outBAMcompression` |-1 to 10 BAM compression level, -1=default compression (6?), 0=no compression, 10=maximum compression |`integer`, example: `1` | -|`--outBAMsortingThreadN` |>=0: number of threads for BAM sorting. 0 will default to min(6,--runThreadN). |`integer`, example: `0` | -|`--outBAMsortingBinsN` |>0: number of genome bins for coordinate-sorting |`integer`, example: `50` | +|Name |Description |Attributes | +|:---------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------| +|`--outSAMmode` |mode of SAM output - None ... no SAM output - Full ... full SAM output - NoQS ... full SAM but without quality scores |`string`, example: `"Full"` | +|`--outSAMstrandField` |Cufflinks-like strand field flag - None ... not used - intronMotif ... strand derived from the intron motif. This option changes the output alignments: reads with inconsistent and/or non-canonical introns are filtered out. |`string` | +|`--outSAMattributes` |a string of desired SAM attributes, in the order desired for the output SAM. Tags can be listed in any combination/order. ***Presets: - None ... no attributes - Standard ... NH HI AS nM - All ... NH HI AS nM NM MD jM jI MC ch ***Alignment: - NH ... number of loci the reads maps to: =1 for unique mappers, >1 for multimappers. Standard SAM tag. - HI ... multiple alignment index, starts with --outSAMattrIHstart (=1 by default). Standard SAM tag. - AS ... local alignment score, +1/-1 for matches/mismateches, score* penalties for indels and gaps. For PE reads, total score for two mates. Stadnard SAM tag. - nM ... number of mismatches. For PE reads, sum over two mates. - NM ... edit distance to the reference (number of mismatched + inserted + deleted bases) for each mate. Standard SAM tag. - MD ... string encoding mismatched and deleted reference bases (see standard SAM specifications). Standard SAM tag. - jM ... intron motifs for all junctions (i.e. N in CIGAR): 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT. If splice junctions database is used, and a junction is annotated, 20 is added to its motif value. - jI ... start and end of introns for all junctions (1-based). - XS ... alignment strand according to --outSAMstrandField. - MC ... mate's CIGAR string. Standard SAM tag. - ch ... marks all segment of all chimeric alingments for --chimOutType WithinBAM output. - cN ... number of bases clipped from the read ends: 5' and 3' ***Variation: - vA ... variant allele - vG ... genomic coordinate of the variant overlapped by the read. - vW ... 1 - alignment passes WASP filtering; 2,3,4,5,6,7 - alignment does not pass WASP filtering. Requires --waspOutputMode SAMtag. ***STARsolo: - CR CY UR UY ... sequences and quality scores of cell barcodes and UMIs for the solo* demultiplexing. - GX GN ... gene ID and gene name for unique-gene reads. - gx gn ... gene IDs and gene names for unique- and multi-gene reads. - CB UB ... error-corrected cell barcodes and UMIs for solo* demultiplexing. Requires --outSAMtype BAM SortedByCoordinate. - sM ... assessment of CB and UMI. - sS ... sequence of the entire barcode (CB,UMI,adapter). - sQ ... quality of the entire barcode. ***Unsupported/undocumented: - ha ... haplotype (1/2) when mapping to the diploid genome. Requires genome generated with --genomeTransformType Diploid . - rB ... alignment block read/genomic coordinates. - vR ... read coordinate of the variant. |List of `string`, example: `"Standard"`, multiple_sep: `";"` | +|`--outSAMattrIHstart` |start value for the IH attribute. 0 may be required by some downstream software, such as Cufflinks or StringTie. |`integer`, example: `1` | +|`--outSAMunmapped` |output of unmapped reads in the SAM format 1st word: - None ... no output - Within ... output unmapped reads within the main SAM file (i.e. Aligned.out.sam) 2nd word: - KeepPairs ... record unmapped mate for each alignment, and, in case of unsorted output, keep it adjacent to its mapped mate. Only affects multi-mapping reads. |List of `string`, multiple_sep: `";"` | +|`--outSAMorder` |type of sorting for the SAM output Paired: one mate after the other for all paired alignments PairedKeepInputOrder: one mate after the other for all paired alignments, the order is kept the same as in the input FASTQ files |`string`, example: `"Paired"` | +|`--outSAMprimaryFlag` |which alignments are considered primary - all others will be marked with 0x100 bit in the FLAG - OneBestScore ... only one alignment with the best score is primary - AllBestScore ... all alignments with the best score are primary |`string`, example: `"OneBestScore"` | +|`--outSAMreadID` |read ID record type - Standard ... first word (until space) from the FASTx read ID line, removing /1,/2 from the end - Number ... read number (index) in the FASTx file |`string`, example: `"Standard"` | +|`--outSAMmapqUnique` |0 to 255: the MAPQ value for unique mappers |`integer`, example: `255` | +|`--outSAMflagOR` |0 to 65535: sam FLAG will be bitwise OR'd with this value, i.e. FLAG=FLAG | outSAMflagOR. This is applied after all flags have been set by STAR, and after outSAMflagAND. Can be used to set specific bits that are not set otherwise. |`integer`, example: `0` | +|`--outSAMflagAND` |0 to 65535: sam FLAG will be bitwise AND'd with this value, i.e. FLAG=FLAG & outSAMflagOR. This is applied after all flags have been set by STAR, but before outSAMflagOR. Can be used to unset specific bits that are not set otherwise. |`integer`, example: `65535` | +|`--outSAMattrRGline` |SAM/BAM read group line. The first word contains the read group identifier and must start with "ID:", e.g. --outSAMattrRGline ID:xxx CN:yy "DS:z z z". xxx will be added as RG tag to each output alignment. Any spaces in the tag values have to be double quoted. Comma separated RG lines correspons to different (comma separated) input files in --readFilesIn. Commas have to be surrounded by spaces, e.g. --outSAMattrRGline ID:xxx , ID:zzz "DS:z z" , ID:yyy DS:yyyy |List of `string`, multiple_sep: `";"` | +|`--outSAMheaderHD` |@HD (header) line of the SAM header |List of `string`, multiple_sep: `";"` | +|`--outSAMheaderPG` |extra @PG (software) line of the SAM header (in addition to STAR) |List of `string`, multiple_sep: `";"` | +|`--outSAMheaderCommentFile` |path to the file with @CO (comment) lines of the SAM header |`string` | +|`--outSAMfilter` |filter the output into main SAM/BAM files - KeepOnlyAddedReferences ... only keep the reads for which all alignments are to the extra reference sequences added with --genomeFastaFiles at the mapping stage. - KeepAllAddedReferences ... keep all alignments to the extra reference sequences added with --genomeFastaFiles at the mapping stage. |List of `string`, multiple_sep: `";"` | +|`--outSAMmultNmax` |max number of multiple alignments for a read that will be output to the SAM/BAM files. Note that if this value is not equal to -1, the top scoring alignment will be output first - -1 ... all alignments (up to --outFilterMultimapNmax) will be output |`integer`, example: `-1` | +|`--outSAMtlen` |calculation method for the TLEN field in the SAM/BAM files - 1 ... leftmost base of the (+)strand mate to rightmost base of the (-)mate. (+)sign for the (+)strand mate - 2 ... leftmost base of any mate to rightmost base of any mate. (+)sign for the mate with the leftmost base. This is different from 1 for overlapping mates with protruding ends |`integer`, example: `1` | +|`--outBAMcompression` |-1 to 10 BAM compression level, -1=default compression (6?), 0=no compression, 10=maximum compression |`integer`, example: `1` | +|`--outBAMsortingThreadN` |>=0: number of threads for BAM sorting. 0 will default to min(6,--runThreadN). |`integer`, example: `0` | +|`--outBAMsortingBinsN` |>0: number of genome bins for coordinate-sorting |`integer`, example: `50` | ### BAM processing @@ -221,12 +221,12 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Output Wiggle -|Name |Description |Attributes | -|:--------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------| -|`--outWigType` |type of signal output, e.g. "bedGraph" OR "bedGraph read1_5p". Requires sorted BAM: --outSAMtype BAM SortedByCoordinate . 1st word: - None ... no signal output - bedGraph ... bedGraph format - wiggle ... wiggle format 2nd word: - read1_5p ... signal from only 5' of the 1st read, useful for CAGE/RAMPAGE etc - read2 ... signal from only 2nd read |`string` | -|`--outWigStrand` |strandedness of wiggle/bedGraph output - Stranded ... separate strands, str1 and str2 - Unstranded ... collapsed strands |`string`, example: `"Stranded"` | -|`--outWigReferencesPrefix` |prefix matching reference names to include in the output wiggle file, e.g. "chr", default "-" - include all references |`string` | -|`--outWigNorm` |type of normalization for the signal - RPM ... reads per million of mapped reads - None ... no normalization, "raw" counts |`string`, example: `"RPM"` | +|Name |Description |Attributes | +|:--------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------| +|`--outWigType` |type of signal output, e.g. "bedGraph" OR "bedGraph read1_5p". Requires sorted BAM: --outSAMtype BAM SortedByCoordinate . 1st word: - None ... no signal output - bedGraph ... bedGraph format - wiggle ... wiggle format 2nd word: - read1_5p ... signal from only 5' of the 1st read, useful for CAGE/RAMPAGE etc - read2 ... signal from only 2nd read |List of `string`, multiple_sep: `";"` | +|`--outWigStrand` |strandedness of wiggle/bedGraph output - Stranded ... separate strands, str1 and str2 - Unstranded ... collapsed strands |`string`, example: `"Stranded"` | +|`--outWigReferencesPrefix` |prefix matching reference names to include in the output wiggle file, e.g. "chr", default "-" - include all references |`string` | +|`--outWigNorm` |type of normalization for the signal - RPM ... reads per million of mapped reads - None ... no normalization, "raw" counts |`string`, example: `"RPM"` | ### Output Filtering @@ -256,14 +256,14 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Output Filtering: Splice Junctions -|Name |Description |Attributes | -|:-------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------| -|`--outSJfilterReads` |which reads to consider for collapsed splice junctions output - All ... all reads, unique- and multi-mappers - Unique ... uniquely mapping reads only |`string`, example: `"All"` | -|`--outSJfilterOverhangMin` |minimum overhang length for splice junctions on both sides for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif does not apply to annotated junctions |`integer`, example: `30`, example: `12`, example: `12`, example: `12` | -|`--outSJfilterCountUniqueMin` |minimum uniquely mapping read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied does not apply to annotated junctions |`integer`, example: `3`, example: `1`, example: `1`, example: `1` | -|`--outSJfilterCountTotalMin` |minimum total (multi-mapping+unique) read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied does not apply to annotated junctions |`integer`, example: `3`, example: `1`, example: `1`, example: `1` | -|`--outSJfilterDistToOtherSJmin` |minimum allowed distance to other junctions' donor/acceptor does not apply to annotated junctions |`integer`, example: `10`, example: `0`, example: `5`, example: `10` | -|`--outSJfilterIntronMaxVsReadN` |maximum gap allowed for junctions supported by 1,2,3,,,N reads i.e. by default junctions supported by 1 read can have gaps <=50000b, by 2 reads: <=100000b, by 3 reads: <=200000. by >=4 reads any gap <=alignIntronMax does not apply to annotated junctions |`integer`, example: `50000`, example: `100000`, example: `200000` | +|Name |Description |Attributes | +|:-------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------| +|`--outSJfilterReads` |which reads to consider for collapsed splice junctions output - All ... all reads, unique- and multi-mappers - Unique ... uniquely mapping reads only |`string`, example: `"All"` | +|`--outSJfilterOverhangMin` |minimum overhang length for splice junctions on both sides for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif does not apply to annotated junctions |List of `integer`, example: `30, 12, 12, 12`, multiple_sep: `";"` | +|`--outSJfilterCountUniqueMin` |minimum uniquely mapping read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied does not apply to annotated junctions |List of `integer`, example: `3, 1, 1, 1`, multiple_sep: `";"` | +|`--outSJfilterCountTotalMin` |minimum total (multi-mapping+unique) read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied does not apply to annotated junctions |List of `integer`, example: `3, 1, 1, 1`, multiple_sep: `";"` | +|`--outSJfilterDistToOtherSJmin` |minimum allowed distance to other junctions' donor/acceptor does not apply to annotated junctions |List of `integer`, example: `10, 0, 5, 10`, multiple_sep: `";"` | +|`--outSJfilterIntronMaxVsReadN` |maximum gap allowed for junctions supported by 1,2,3,,,N reads i.e. by default junctions supported by 1 read can have gaps <=50000b, by 2 reads: <=100000b, by 3 reads: <=200000. by >=4 reads any gap <=alignIntronMax does not apply to annotated junctions |List of `integer`, example: `50000, 100000, 200000`, multiple_sep: `";"` | ### Scoring @@ -284,32 +284,32 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Alignments and Seeding -|Name |Description |Attributes | -|:------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------| -|`--seedSearchStartLmax` |defines the search start point through the read - the read is split into pieces no longer than this value |`integer`, example: `50` | -|`--seedSearchStartLmaxOverLread` |seedSearchStartLmax normalized to read length (sum of mates' lengths for paired-end reads) |`double`, example: `1` | -|`--seedSearchLmax` |defines the maximum length of the seeds, if =0 seed length is not limited |`integer`, example: `0` | -|`--seedMultimapNmax` |only pieces that map fewer than this value are utilized in the stitching procedure |`integer`, example: `10000` | -|`--seedPerReadNmax` |max number of seeds per read |`integer`, example: `1000` | -|`--seedPerWindowNmax` |max number of seeds per window |`integer`, example: `50` | -|`--seedNoneLociPerWindow` |max number of one seed loci per window |`integer`, example: `10` | -|`--seedSplitMin` |min length of the seed sequences split by Ns or mate gap |`integer`, example: `12` | -|`--seedMapMin` |min length of seeds to be mapped |`integer`, example: `5` | -|`--alignIntronMin` |minimum intron size, genomic gap is considered intron if its length>=alignIntronMin, otherwise it is considered Deletion |`integer`, example: `21` | -|`--alignIntronMax` |maximum intron size, if 0, max intron size will be determined by (2^winBinNbits)*winAnchorDistNbins |`integer`, example: `0` | -|`--alignMatesGapMax` |maximum gap between two mates, if 0, max intron gap will be determined by (2^winBinNbits)*winAnchorDistNbins |`integer`, example: `0` | -|`--alignSJoverhangMin` |minimum overhang (i.e. block size) for spliced alignments |`integer`, example: `5` | -|`--alignSJstitchMismatchNmax` |maximum number of mismatches for stitching of the splice junctions (-1: no limit). (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. |`integer`, example: `0`, example: `-1`, example: `0`, example: `0` | -|`--alignSJDBoverhangMin` |minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments |`integer`, example: `3` | -|`--alignSplicedMateMapLmin` |minimum mapped length for a read mate that is spliced |`integer`, example: `0` | -|`--alignSplicedMateMapLminOverLmate` |alignSplicedMateMapLmin normalized to mate length |`double`, example: `0.66` | -|`--alignWindowsPerReadNmax` |max number of windows per read |`integer`, example: `10000` | -|`--alignTranscriptsPerWindowNmax` |max number of transcripts per window |`integer`, example: `100` | -|`--alignTranscriptsPerReadNmax` |max number of different alignments per read to consider |`integer`, example: `10000` | -|`--alignEndsType` |type of read ends alignment - Local ... standard local alignment with soft-clipping allowed - EndToEnd ... force end-to-end read alignment, do not soft-clip - Extend5pOfRead1 ... fully extend only the 5p of the read1, all other ends: local alignment - Extend5pOfReads12 ... fully extend only the 5p of the both read1 and read2, all other ends: local alignment |`string`, example: `"Local"` | -|`--alignEndsProtrude` |allow protrusion of alignment ends, i.e. start (end) of the +strand mate downstream of the start (end) of the -strand mate 1st word: int: maximum number of protrusion bases allowed 2nd word: string: - ConcordantPair ... report alignments with non-zero protrusion as concordant pairs - DiscordantPair ... report alignments with non-zero protrusion as discordant pairs |`string`, example: `"0 ConcordantPair"` | -|`--alignSoftClipAtReferenceEnds` |allow the soft-clipping of the alignments past the end of the chromosomes - Yes ... allow - No ... prohibit, useful for compatibility with Cufflinks |`string`, example: `"Yes"` | -|`--alignInsertionFlush` |how to flush ambiguous insertion positions - None ... insertions are not flushed - Right ... insertions are flushed to the right |`string` | +|Name |Description |Attributes | +|:------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------| +|`--seedSearchStartLmax` |defines the search start point through the read - the read is split into pieces no longer than this value |`integer`, example: `50` | +|`--seedSearchStartLmaxOverLread` |seedSearchStartLmax normalized to read length (sum of mates' lengths for paired-end reads) |`double`, example: `1` | +|`--seedSearchLmax` |defines the maximum length of the seeds, if =0 seed length is not limited |`integer`, example: `0` | +|`--seedMultimapNmax` |only pieces that map fewer than this value are utilized in the stitching procedure |`integer`, example: `10000` | +|`--seedPerReadNmax` |max number of seeds per read |`integer`, example: `1000` | +|`--seedPerWindowNmax` |max number of seeds per window |`integer`, example: `50` | +|`--seedNoneLociPerWindow` |max number of one seed loci per window |`integer`, example: `10` | +|`--seedSplitMin` |min length of the seed sequences split by Ns or mate gap |`integer`, example: `12` | +|`--seedMapMin` |min length of seeds to be mapped |`integer`, example: `5` | +|`--alignIntronMin` |minimum intron size, genomic gap is considered intron if its length>=alignIntronMin, otherwise it is considered Deletion |`integer`, example: `21` | +|`--alignIntronMax` |maximum intron size, if 0, max intron size will be determined by (2^winBinNbits)*winAnchorDistNbins |`integer`, example: `0` | +|`--alignMatesGapMax` |maximum gap between two mates, if 0, max intron gap will be determined by (2^winBinNbits)*winAnchorDistNbins |`integer`, example: `0` | +|`--alignSJoverhangMin` |minimum overhang (i.e. block size) for spliced alignments |`integer`, example: `5` | +|`--alignSJstitchMismatchNmax` |maximum number of mismatches for stitching of the splice junctions (-1: no limit). (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. |List of `integer`, example: `0, -1, 0, 0`, multiple_sep: `";"` | +|`--alignSJDBoverhangMin` |minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments |`integer`, example: `3` | +|`--alignSplicedMateMapLmin` |minimum mapped length for a read mate that is spliced |`integer`, example: `0` | +|`--alignSplicedMateMapLminOverLmate` |alignSplicedMateMapLmin normalized to mate length |`double`, example: `0.66` | +|`--alignWindowsPerReadNmax` |max number of windows per read |`integer`, example: `10000` | +|`--alignTranscriptsPerWindowNmax` |max number of transcripts per window |`integer`, example: `100` | +|`--alignTranscriptsPerReadNmax` |max number of different alignments per read to consider |`integer`, example: `10000` | +|`--alignEndsType` |type of read ends alignment - Local ... standard local alignment with soft-clipping allowed - EndToEnd ... force end-to-end read alignment, do not soft-clip - Extend5pOfRead1 ... fully extend only the 5p of the read1, all other ends: local alignment - Extend5pOfReads12 ... fully extend only the 5p of the both read1 and read2, all other ends: local alignment |`string`, example: `"Local"` | +|`--alignEndsProtrude` |allow protrusion of alignment ends, i.e. start (end) of the +strand mate downstream of the start (end) of the -strand mate 1st word: int: maximum number of protrusion bases allowed 2nd word: string: - ConcordantPair ... report alignments with non-zero protrusion as concordant pairs - DiscordantPair ... report alignments with non-zero protrusion as discordant pairs |`string`, example: `"0 ConcordantPair"` | +|`--alignSoftClipAtReferenceEnds` |allow the soft-clipping of the alignments past the end of the chromosomes - Yes ... allow - No ... prohibit, useful for compatibility with Cufflinks |`string`, example: `"Yes"` | +|`--alignInsertionFlush` |how to flush ambiguous insertion positions - None ... insertions are not flushed - Right ... insertions are flushed to the right |`string` | ### Paired-End reads @@ -334,29 +334,29 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Chimeric Alignments -|Name |Description |Attributes | -|:----------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------| -|`--chimOutType` |type of chimeric output - Junctions ... Chimeric.out.junction - SeparateSAMold ... output old SAM into separate Chimeric.out.sam file - WithinBAM ... output into main aligned BAM files (Aligned.*.bam) - WithinBAM HardClip ... (default) hard-clipping in the CIGAR for supplemental chimeric alignments (default if no 2nd word is present) - WithinBAM SoftClip ... soft-clipping in the CIGAR for supplemental chimeric alignments |`string`, example: `"Junctions"` | -|`--chimSegmentMin` |minimum length of chimeric segment length, if ==0, no chimeric output |`integer`, example: `0` | -|`--chimScoreMin` |minimum total (summed) score of the chimeric segments |`integer`, example: `0` | -|`--chimScoreDropMax` |max drop (difference) of chimeric score (the sum of scores of all chimeric segments) from the read length |`integer`, example: `20` | -|`--chimScoreSeparation` |minimum difference (separation) between the best chimeric score and the next one |`integer`, example: `10` | -|`--chimScoreJunctionNonGTAG` |penalty for a non-GT/AG chimeric junction |`integer`, example: `-1` | -|`--chimJunctionOverhangMin` |minimum overhang for a chimeric junction |`integer`, example: `20` | -|`--chimSegmentReadGapMax` |maximum gap in the read sequence between chimeric segments |`integer`, example: `0` | -|`--chimFilter` |different filters for chimeric alignments - None ... no filtering - banGenomicN ... Ns are not allowed in the genome sequence around the chimeric junction |`string`, example: `"banGenomicN"` | -|`--chimMainSegmentMultNmax` |maximum number of multi-alignments for the main chimeric segment. =1 will prohibit multimapping main segments. |`integer`, example: `10` | -|`--chimMultimapNmax` |maximum number of chimeric multi-alignments - 0 ... use the old scheme for chimeric detection which only considered unique alignments |`integer`, example: `0` | -|`--chimMultimapScoreRange` |the score range for multi-mapping chimeras below the best chimeric score. Only works with --chimMultimapNmax > 1 |`integer`, example: `1` | -|`--chimNonchimScoreDropMin` |to trigger chimeric detection, the drop in the best non-chimeric alignment score with respect to the read length has to be greater than this value |`integer`, example: `20` | -|`--chimOutJunctionFormat` |formatting type for the Chimeric.out.junction file - 0 ... no comment lines/headers - 1 ... comment lines at the end of the file: command line and Nreads: total, unique/multi-mapping |`integer`, example: `0` | +|Name |Description |Attributes | +|:----------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------| +|`--chimOutType` |type of chimeric output - Junctions ... Chimeric.out.junction - SeparateSAMold ... output old SAM into separate Chimeric.out.sam file - WithinBAM ... output into main aligned BAM files (Aligned.*.bam) - WithinBAM HardClip ... (default) hard-clipping in the CIGAR for supplemental chimeric alignments (default if no 2nd word is present) - WithinBAM SoftClip ... soft-clipping in the CIGAR for supplemental chimeric alignments |List of `string`, example: `"Junctions"`, multiple_sep: `";"` | +|`--chimSegmentMin` |minimum length of chimeric segment length, if ==0, no chimeric output |`integer`, example: `0` | +|`--chimScoreMin` |minimum total (summed) score of the chimeric segments |`integer`, example: `0` | +|`--chimScoreDropMax` |max drop (difference) of chimeric score (the sum of scores of all chimeric segments) from the read length |`integer`, example: `20` | +|`--chimScoreSeparation` |minimum difference (separation) between the best chimeric score and the next one |`integer`, example: `10` | +|`--chimScoreJunctionNonGTAG` |penalty for a non-GT/AG chimeric junction |`integer`, example: `-1` | +|`--chimJunctionOverhangMin` |minimum overhang for a chimeric junction |`integer`, example: `20` | +|`--chimSegmentReadGapMax` |maximum gap in the read sequence between chimeric segments |`integer`, example: `0` | +|`--chimFilter` |different filters for chimeric alignments - None ... no filtering - banGenomicN ... Ns are not allowed in the genome sequence around the chimeric junction |List of `string`, example: `"banGenomicN"`, multiple_sep: `";"` | +|`--chimMainSegmentMultNmax` |maximum number of multi-alignments for the main chimeric segment. =1 will prohibit multimapping main segments. |`integer`, example: `10` | +|`--chimMultimapNmax` |maximum number of chimeric multi-alignments - 0 ... use the old scheme for chimeric detection which only considered unique alignments |`integer`, example: `0` | +|`--chimMultimapScoreRange` |the score range for multi-mapping chimeras below the best chimeric score. Only works with --chimMultimapNmax > 1 |`integer`, example: `1` | +|`--chimNonchimScoreDropMin` |to trigger chimeric detection, the drop in the best non-chimeric alignment score with respect to the read length has to be greater than this value |`integer`, example: `20` | +|`--chimOutJunctionFormat` |formatting type for the Chimeric.out.junction file - 0 ... no comment lines/headers - 1 ... comment lines at the end of the file: command line and Nreads: total, unique/multi-mapping |`integer`, example: `0` | ### Quantification of Annotations |Name |Description |Attributes | |:------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------| -|`--quantMode` |types of quantification requested - - ... none - TranscriptomeSAM ... output SAM/BAM alignments to transcriptome into a separate file - GeneCounts ... count reads per gene |`string` | +|`--quantMode` |types of quantification requested - - ... none - TranscriptomeSAM ... output SAM/BAM alignments to transcriptome into a separate file - GeneCounts ... count reads per gene |List of `string`, multiple_sep: `";"` | |`--quantTranscriptomeBAMcompression` |-2 to 10 transcriptome BAM compression level - -2 ... no BAM output - -1 ... default compression (6?) - 0 ... no compression - 10 ... maximum compression |`integer`, example: `1` | |`--quantTranscriptomeBan` |prohibit various alignment type - IndelSoftclipSingleend ... prohibit indels, soft clipping and single-end alignments - compatible with RSEM - Singleend ... prohibit single-end alignments |`string`, example: `"IndelSoftclipSingleend"` | @@ -378,49 +378,49 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### STARsolo (single cell RNA-seq) parameters -|Name |Description |Attributes | -|:-----------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------| -|`--soloType` |type of single-cell RNA-seq - CB_UMI_Simple ... (a.k.a. Droplet) one UMI and one Cell Barcode of fixed length in read2, e.g. Drop-seq and 10X Chromium. - CB_UMI_Complex ... multiple Cell Barcodes of varying length, one UMI of fixed length and one adapter sequence of fixed length are allowed in read2 only (e.g. inDrop, ddSeq). - CB_samTagOut ... output Cell Barcode as CR and/or CB SAm tag. No UMI counting. --readFilesIn cDNA_read1 [cDNA_read2 if paired-end] CellBarcode_read . Requires --outSAMtype BAM Unsorted [and/or SortedByCoordinate] - SmartSeq ... Smart-seq: each cell in a separate FASTQ (paired- or single-end), barcodes are corresponding read-groups, no UMI sequences, alignments deduplicated according to alignment start and end (after extending soft-clipped bases) |`string` | -|`--soloCBwhitelist` |file(s) with whitelist(s) of cell barcodes. Only --soloType CB_UMI_Complex allows more than one whitelist file. - None ... no whitelist: all cell barcodes are allowed |`string` | -|`--soloCBstart` |cell barcode start base |`integer`, example: `1` | -|`--soloCBlen` |cell barcode length |`integer`, example: `16` | -|`--soloUMIstart` |UMI start base |`integer`, example: `17` | -|`--soloUMIlen` |UMI length |`integer`, example: `10` | -|`--soloBarcodeReadLength` |length of the barcode read - 1 ... equal to sum of soloCBlen+soloUMIlen - 0 ... not defined, do not check |`integer`, example: `1` | -|`--soloBarcodeMate` |identifies which read mate contains the barcode (CB+UMI) sequence - 0 ... barcode sequence is on separate read, which should always be the last file in the --readFilesIn listed - 1 ... barcode sequence is a part of mate 1 - 2 ... barcode sequence is a part of mate 2 |`integer`, example: `0` | -|`--soloCBposition` |position of Cell Barcode(s) on the barcode read. Presently only works with --soloType CB_UMI_Complex, and barcodes are assumed to be on Read2. Format for each barcode: startAnchor_startPosition_endAnchor_endPosition start(end)Anchor defines the Anchor Base for the CB: 0: read start; 1: read end; 2: adapter start; 3: adapter end start(end)Position is the 0-based position with of the CB start(end) with respect to the Anchor Base String for different barcodes are separated by space. Example: inDrop (Zilionis et al, Nat. Protocols, 2017): --soloCBposition 0_0_2_-1 3_1_3_8 |`string` | -|`--soloUMIposition` |position of the UMI on the barcode read, same as soloCBposition Example: inDrop (Zilionis et al, Nat. Protocols, 2017): --soloCBposition 3_9_3_14 |`string` | -|`--soloAdapterSequence` |adapter sequence to anchor barcodes. Only one adapter sequence is allowed. |`string` | -|`--soloAdapterMismatchesNmax` |maximum number of mismatches allowed in adapter sequence. |`integer`, example: `1` | -|`--soloCBmatchWLtype` |matching the Cell Barcodes to the WhiteList - Exact ... only exact matches allowed - 1MM ... only one match in whitelist with 1 mismatched base allowed. Allowed CBs have to have at least one read with exact match. - 1MM_multi ... multiple matches in whitelist with 1 mismatched base allowed, posterior probability calculation is used choose one of the matches. Allowed CBs have to have at least one read with exact match. This option matches best with CellRanger 2.2.0 - 1MM_multi_pseudocounts ... same as 1MM_Multi, but pseudocounts of 1 are added to all whitelist barcodes. - 1MM_multi_Nbase_pseudocounts ... same as 1MM_multi_pseudocounts, multimatching to WL is allowed for CBs with N-bases. This option matches best with CellRanger >= 3.0.0 - EditDist_2 ... allow up to edit distance of 3 fpr each of the barcodes. May include one deletion + one insertion. Only works with --soloType CB_UMI_Complex. Matches to multiple passlist barcdoes are not allowed. Similar to ParseBio Split-seq pipeline. |`string`, example: `"1MM_multi"` | -|`--soloInputSAMattrBarcodeSeq` |when inputting reads from a SAM file (--readsFileType SAM SE/PE), these SAM attributes mark the barcode sequence (in proper order). For instance, for 10X CellRanger or STARsolo BAMs, use --soloInputSAMattrBarcodeSeq CR UR . This parameter is required when running STARsolo with input from SAM. |`string` | -|`--soloInputSAMattrBarcodeQual` |when inputting reads from a SAM file (--readsFileType SAM SE/PE), these SAM attributes mark the barcode qualities (in proper order). For instance, for 10X CellRanger or STARsolo BAMs, use --soloInputSAMattrBarcodeQual CY UY . If this parameter is '-' (default), the quality 'H' will be assigned to all bases. |`string` | -|`--soloStrand` |strandedness of the solo libraries: - Unstranded ... no strand information - Forward ... read strand same as the original RNA molecule - Reverse ... read strand opposite to the original RNA molecule |`string`, example: `"Forward"` | -|`--soloFeatures` |genomic features for which the UMI counts per Cell Barcode are collected - Gene ... genes: reads match the gene transcript - SJ ... splice junctions: reported in SJ.out.tab - GeneFull ... full gene (pre-mRNA): count all reads overlapping genes' exons and introns - GeneFull_ExonOverIntron ... full gene (pre-mRNA): count all reads overlapping genes' exons and introns: prioritize 100% overlap with exons - GeneFull_Ex50pAS ... full gene (pre-RNA): count all reads overlapping genes' exons and introns: prioritize >50% overlap with exons. Do not count reads with 100% exonic overlap in the antisense direction. |`string`, example: `"Gene"` | -|`--soloMultiMappers` |counting method for reads mapping to multiple genes - Unique ... count only reads that map to unique genes - Uniform ... uniformly distribute multi-genic UMIs to all genes - Rescue ... distribute UMIs proportionally to unique+uniform counts (~ first iteration of EM) - PropUnique ... distribute UMIs proportionally to unique mappers, if present, and uniformly if not. - EM ... multi-gene UMIs are distributed using Expectation Maximization algorithm |`string`, example: `"Unique"` | -|`--soloUMIdedup` |type of UMI deduplication (collapsing) algorithm - 1MM_All ... all UMIs with 1 mismatch distance to each other are collapsed (i.e. counted once). - 1MM_Directional_UMItools ... follows the "directional" method from the UMI-tools by Smith, Heger and Sudbery (Genome Research 2017). - 1MM_Directional ... same as 1MM_Directional_UMItools, but with more stringent criteria for duplicate UMIs - Exact ... only exactly matching UMIs are collapsed. - NoDedup ... no deduplication of UMIs, count all reads. - 1MM_CR ... CellRanger2-4 algorithm for 1MM UMI collapsing. |`string`, example: `"1MM_All"` | -|`--soloUMIfiltering` |type of UMI filtering (for reads uniquely mapping to genes) - - ... basic filtering: remove UMIs with N and homopolymers (similar to CellRanger 2.2.0). - MultiGeneUMI ... basic + remove lower-count UMIs that map to more than one gene. - MultiGeneUMI_All ... basic + remove all UMIs that map to more than one gene. - MultiGeneUMI_CR ... basic + remove lower-count UMIs that map to more than one gene, matching CellRanger > 3.0.0 . Only works with --soloUMIdedup 1MM_CR |`string` | -|`--soloOutFileNames` |file names for STARsolo output: file_name_prefix gene_names barcode_sequences cell_feature_count_matrix |`string`, example: `"Solo.out/"`, example: `"features.tsv"`, example: `"barcodes.tsv"`, example: `"matrix.mtx"` | -|`--soloCellFilter` |cell filtering type and parameters - None ... do not output filtered cells - TopCells ... only report top cells by UMI count, followed by the exact number of cells - CellRanger2.2 ... simple filtering of CellRanger 2.2. Can be followed by numbers: number of expected cells, robust maximum percentile for UMI count, maximum to minimum ratio for UMI count The harcoded values are from CellRanger: nExpectedCells=3000; maxPercentile=0.99; maxMinRatio=10 - EmptyDrops_CR ... EmptyDrops filtering in CellRanger flavor. Please cite the original EmptyDrops paper: A.T.L Lun et al, Genome Biology, 20, 63 (2019): https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1662-y Can be followed by 10 numeric parameters: nExpectedCells maxPercentile maxMinRatio indMin indMax umiMin umiMinFracMedian candMaxN FDR simN The harcoded values are from CellRanger: 3000 0.99 10 45000 90000 500 0.01 20000 0.01 10000 |`string`, example: `"CellRanger2.2"`, example: `"3000"`, example: `"0.99"`, example: `"10"` | -|`--soloOutFormatFeaturesGeneField3` |field 3 in the Gene features.tsv file. If "-", then no 3rd field is output. |`string`, example: `"Gene Expression"` | -|`--soloCellReadStats` |Output reads statistics for each CB - Standard ... standard output |`string` | +|Name |Description |Attributes | +|:-----------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------| +|`--soloType` |type of single-cell RNA-seq - CB_UMI_Simple ... (a.k.a. Droplet) one UMI and one Cell Barcode of fixed length in read2, e.g. Drop-seq and 10X Chromium. - CB_UMI_Complex ... multiple Cell Barcodes of varying length, one UMI of fixed length and one adapter sequence of fixed length are allowed in read2 only (e.g. inDrop, ddSeq). - CB_samTagOut ... output Cell Barcode as CR and/or CB SAm tag. No UMI counting. --readFilesIn cDNA_read1 [cDNA_read2 if paired-end] CellBarcode_read . Requires --outSAMtype BAM Unsorted [and/or SortedByCoordinate] - SmartSeq ... Smart-seq: each cell in a separate FASTQ (paired- or single-end), barcodes are corresponding read-groups, no UMI sequences, alignments deduplicated according to alignment start and end (after extending soft-clipped bases) |List of `string`, multiple_sep: `";"` | +|`--soloCBwhitelist` |file(s) with whitelist(s) of cell barcodes. Only --soloType CB_UMI_Complex allows more than one whitelist file. - None ... no whitelist: all cell barcodes are allowed |List of `string`, multiple_sep: `";"` | +|`--soloCBstart` |cell barcode start base |`integer`, example: `1` | +|`--soloCBlen` |cell barcode length |`integer`, example: `16` | +|`--soloUMIstart` |UMI start base |`integer`, example: `17` | +|`--soloUMIlen` |UMI length |`integer`, example: `10` | +|`--soloBarcodeReadLength` |length of the barcode read - 1 ... equal to sum of soloCBlen+soloUMIlen - 0 ... not defined, do not check |`integer`, example: `1` | +|`--soloBarcodeMate` |identifies which read mate contains the barcode (CB+UMI) sequence - 0 ... barcode sequence is on separate read, which should always be the last file in the --readFilesIn listed - 1 ... barcode sequence is a part of mate 1 - 2 ... barcode sequence is a part of mate 2 |`integer`, example: `0` | +|`--soloCBposition` |position of Cell Barcode(s) on the barcode read. Presently only works with --soloType CB_UMI_Complex, and barcodes are assumed to be on Read2. Format for each barcode: startAnchor_startPosition_endAnchor_endPosition start(end)Anchor defines the Anchor Base for the CB: 0: read start; 1: read end; 2: adapter start; 3: adapter end start(end)Position is the 0-based position with of the CB start(end) with respect to the Anchor Base String for different barcodes are separated by space. Example: inDrop (Zilionis et al, Nat. Protocols, 2017): --soloCBposition 0_0_2_-1 3_1_3_8 |List of `string`, multiple_sep: `";"` | +|`--soloUMIposition` |position of the UMI on the barcode read, same as soloCBposition Example: inDrop (Zilionis et al, Nat. Protocols, 2017): --soloCBposition 3_9_3_14 |`string` | +|`--soloAdapterSequence` |adapter sequence to anchor barcodes. Only one adapter sequence is allowed. |`string` | +|`--soloAdapterMismatchesNmax` |maximum number of mismatches allowed in adapter sequence. |`integer`, example: `1` | +|`--soloCBmatchWLtype` |matching the Cell Barcodes to the WhiteList - Exact ... only exact matches allowed - 1MM ... only one match in whitelist with 1 mismatched base allowed. Allowed CBs have to have at least one read with exact match. - 1MM_multi ... multiple matches in whitelist with 1 mismatched base allowed, posterior probability calculation is used choose one of the matches. Allowed CBs have to have at least one read with exact match. This option matches best with CellRanger 2.2.0 - 1MM_multi_pseudocounts ... same as 1MM_Multi, but pseudocounts of 1 are added to all whitelist barcodes. - 1MM_multi_Nbase_pseudocounts ... same as 1MM_multi_pseudocounts, multimatching to WL is allowed for CBs with N-bases. This option matches best with CellRanger >= 3.0.0 - EditDist_2 ... allow up to edit distance of 3 fpr each of the barcodes. May include one deletion + one insertion. Only works with --soloType CB_UMI_Complex. Matches to multiple passlist barcdoes are not allowed. Similar to ParseBio Split-seq pipeline. |`string`, example: `"1MM_multi"` | +|`--soloInputSAMattrBarcodeSeq` |when inputting reads from a SAM file (--readsFileType SAM SE/PE), these SAM attributes mark the barcode sequence (in proper order). For instance, for 10X CellRanger or STARsolo BAMs, use --soloInputSAMattrBarcodeSeq CR UR . This parameter is required when running STARsolo with input from SAM. |List of `string`, multiple_sep: `";"` | +|`--soloInputSAMattrBarcodeQual` |when inputting reads from a SAM file (--readsFileType SAM SE/PE), these SAM attributes mark the barcode qualities (in proper order). For instance, for 10X CellRanger or STARsolo BAMs, use --soloInputSAMattrBarcodeQual CY UY . If this parameter is '-' (default), the quality 'H' will be assigned to all bases. |List of `string`, multiple_sep: `";"` | +|`--soloStrand` |strandedness of the solo libraries: - Unstranded ... no strand information - Forward ... read strand same as the original RNA molecule - Reverse ... read strand opposite to the original RNA molecule |`string`, example: `"Forward"` | +|`--soloFeatures` |genomic features for which the UMI counts per Cell Barcode are collected - Gene ... genes: reads match the gene transcript - SJ ... splice junctions: reported in SJ.out.tab - GeneFull ... full gene (pre-mRNA): count all reads overlapping genes' exons and introns - GeneFull_ExonOverIntron ... full gene (pre-mRNA): count all reads overlapping genes' exons and introns: prioritize 100% overlap with exons - GeneFull_Ex50pAS ... full gene (pre-RNA): count all reads overlapping genes' exons and introns: prioritize >50% overlap with exons. Do not count reads with 100% exonic overlap in the antisense direction. |List of `string`, example: `"Gene"`, multiple_sep: `";"` | +|`--soloMultiMappers` |counting method for reads mapping to multiple genes - Unique ... count only reads that map to unique genes - Uniform ... uniformly distribute multi-genic UMIs to all genes - Rescue ... distribute UMIs proportionally to unique+uniform counts (~ first iteration of EM) - PropUnique ... distribute UMIs proportionally to unique mappers, if present, and uniformly if not. - EM ... multi-gene UMIs are distributed using Expectation Maximization algorithm |List of `string`, example: `"Unique"`, multiple_sep: `";"` | +|`--soloUMIdedup` |type of UMI deduplication (collapsing) algorithm - 1MM_All ... all UMIs with 1 mismatch distance to each other are collapsed (i.e. counted once). - 1MM_Directional_UMItools ... follows the "directional" method from the UMI-tools by Smith, Heger and Sudbery (Genome Research 2017). - 1MM_Directional ... same as 1MM_Directional_UMItools, but with more stringent criteria for duplicate UMIs - Exact ... only exactly matching UMIs are collapsed. - NoDedup ... no deduplication of UMIs, count all reads. - 1MM_CR ... CellRanger2-4 algorithm for 1MM UMI collapsing. |List of `string`, example: `"1MM_All"`, multiple_sep: `";"` | +|`--soloUMIfiltering` |type of UMI filtering (for reads uniquely mapping to genes) - - ... basic filtering: remove UMIs with N and homopolymers (similar to CellRanger 2.2.0). - MultiGeneUMI ... basic + remove lower-count UMIs that map to more than one gene. - MultiGeneUMI_All ... basic + remove all UMIs that map to more than one gene. - MultiGeneUMI_CR ... basic + remove lower-count UMIs that map to more than one gene, matching CellRanger > 3.0.0 . Only works with --soloUMIdedup 1MM_CR |List of `string`, multiple_sep: `";"` | +|`--soloOutFileNames` |file names for STARsolo output: file_name_prefix gene_names barcode_sequences cell_feature_count_matrix |List of `string`, example: `"Solo.out/", "features.tsv", "barcodes.tsv", "matrix.mtx"`, multiple_sep: `";"` | +|`--soloCellFilter` |cell filtering type and parameters - None ... do not output filtered cells - TopCells ... only report top cells by UMI count, followed by the exact number of cells - CellRanger2.2 ... simple filtering of CellRanger 2.2. Can be followed by numbers: number of expected cells, robust maximum percentile for UMI count, maximum to minimum ratio for UMI count The harcoded values are from CellRanger: nExpectedCells=3000; maxPercentile=0.99; maxMinRatio=10 - EmptyDrops_CR ... EmptyDrops filtering in CellRanger flavor. Please cite the original EmptyDrops paper: A.T.L Lun et al, Genome Biology, 20, 63 (2019): https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1662-y Can be followed by 10 numeric parameters: nExpectedCells maxPercentile maxMinRatio indMin indMax umiMin umiMinFracMedian candMaxN FDR simN The harcoded values are from CellRanger: 3000 0.99 10 45000 90000 500 0.01 20000 0.01 10000 |List of `string`, example: `"CellRanger2.2", "3000", "0.99", "10"`, multiple_sep: `";"` | +|`--soloOutFormatFeaturesGeneField3` |field 3 in the Gene features.tsv file. If "-", then no 3rd field is output. |List of `string`, example: `"Gene Expression"`, multiple_sep: `";"` | +|`--soloCellReadStats` |Output reads statistics for each CB - Standard ... standard output |`string` | ### HTSeq arguments -|Name |Description |Attributes | -|:-----------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------| -|`--stranded` |Whether the data is from a strand-specific assay. 'reverse' means 'yes' with reversed strand interpretation. |`string`, default: `"yes"` | -|`--minimum_alignment_quality` |Skip all reads with MAPQ alignment quality lower than the given minimum value. MAPQ is the 5th column of a SAM/BAM file and its usage depends on the software used to map the reads. |`integer`, default: `10` | -|`--type` |Feature type (3rd column in GTF file) to be used, all features of other type are ignored (default, suitable for Ensembl GTF files: exon) |`string`, example: `"exon"` | -|`--id_attribute` |GTF attribute to be used as feature ID (default, suitable for Ensembl GTF files: gene_id). All feature of the right type (see -t option) within the same GTF attribute will be added together. The typical way of using this option is to count all exonic reads from each gene and add the exons but other uses are possible as well. You can call this option multiple times: in that case, the combination of all attributes separated by colons (:) will be used as a unique identifier, e.g. for exons you might use -i gene_id -i exon_number. |`string`, example: `"gene_id"` | -|`--additional_attributes` |Additional feature attributes (suitable for Ensembl GTF files: gene_name). Use multiple times for more than one additional attribute. These attributes are only used as annotations in the output, while the determination of how the counts are added together is done based on option -i. |`string`, example: `"gene_name"` | -|`--add_chromosome_info` |Store information about the chromosome of each feature as an additional attribute (e.g. colunm in the TSV output file). |`boolean_true` | -|`--mode` |Mode to handle reads overlapping more than one feature. |`string`, default: `"union"` | -|`--non_unique` |Whether and how to score reads that are not uniquely aligned or ambiguously assigned to features. |`string`, default: `"none"` | -|`--secondary_alignments` |Whether to score secondary alignments (0x100 flag). |`string` | -|`--supplementary_alignments` |Whether to score supplementary alignments (0x800 flag). |`string` | -|`--counts_output_sparse` |Store the counts as a sparse matrix (mtx, h5ad, loom). |`boolean_true` | +|Name |Description |Attributes | +|:-----------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------| +|`--stranded` |Whether the data is from a strand-specific assay. 'reverse' means 'yes' with reversed strand interpretation. |`string`, default: `"yes"` | +|`--minimum_alignment_quality` |Skip all reads with MAPQ alignment quality lower than the given minimum value. MAPQ is the 5th column of a SAM/BAM file and its usage depends on the software used to map the reads. |`integer`, default: `10` | +|`--type` |Feature type (3rd column in GTF file) to be used, all features of other type are ignored (default, suitable for Ensembl GTF files: exon) |`string`, example: `"exon"` | +|`--id_attribute` |GTF attribute to be used as feature ID (default, suitable for Ensembl GTF files: gene_id). All feature of the right type (see -t option) within the same GTF attribute will be added together. The typical way of using this option is to count all exonic reads from each gene and add the exons but other uses are possible as well. You can call this option multiple times: in that case, the combination of all attributes separated by colons (:) will be used as a unique identifier, e.g. for exons you might use -i gene_id -i exon_number. |List of `string`, example: `"gene_id"`, multiple_sep: `":"` | +|`--additional_attributes` |Additional feature attributes (suitable for Ensembl GTF files: gene_name). Use multiple times for more than one additional attribute. These attributes are only used as annotations in the output, while the determination of how the counts are added together is done based on option -i. |List of `string`, example: `"gene_name"`, multiple_sep: `":"` | +|`--add_chromosome_info` |Store information about the chromosome of each feature as an additional attribute (e.g. colunm in the TSV output file). |`boolean_true` | +|`--mode` |Mode to handle reads overlapping more than one feature. |`string`, default: `"union"` | +|`--non_unique` |Whether and how to score reads that are not uniquely aligned or ambiguously assigned to features. |`string`, default: `"none"` | +|`--secondary_alignments` |Whether to score secondary alignments (0x100 flag). |`string` | +|`--supplementary_alignments` |Whether to score supplementary alignments (0x800 flag). |`string` | +|`--counts_output_sparse` |Store the counts as a sparse matrix (mtx, h5ad, loom). |`boolean_true` | ## Authors diff --git a/components/modules/mapping/star_align.qmd b/components/modules/mapping/star_align.qmd index d209355a..ebb69f8a 100644 --- a/components/modules/mapping/star_align.qmd +++ b/components/modules/mapping/star_align.qmd @@ -67,11 +67,11 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Input/Output -|Name |Description |Attributes | -|:-------------|:-----------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------| -|`--input` |The FASTQ files to be analyzed. Corresponds to the --readFilesIn argument in the STAR command. |`file`, required, example: `"mysample_S1_L001_R1_001.fastq.gz"`, example: `"mysample_S1_L001_R2_001.fastq.gz"` | -|`--reference` |Path to the reference built by star_build_reference. Corresponds to the --genomeDir argument in the STAR command. |`file`, required, example: `"/path/to/reference"` | -|`--output` |Path to output directory. Corresponds to the --outFileNamePrefix argument in the STAR command. |`file`, required, example: `"/path/to/foo"` | +|Name |Description |Attributes | +|:-------------|:-----------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------| +|`--input` |The FASTQ files to be analyzed. Corresponds to the --readFilesIn argument in the STAR command. |List of `file`, required, example: `"mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz"`, multiple_sep: `";"` | +|`--reference` |Path to the reference built by star_build_reference. Corresponds to the --genomeDir argument in the STAR command. |`file`, required, example: `"/path/to/reference"` | +|`--output` |Path to output directory. Corresponds to the --outFileNamePrefix argument in the STAR command. |`file`, required, example: `"/path/to/foo"` | ### Run Parameters @@ -83,30 +83,30 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Genome Parameters -|Name |Description |Attributes | -|:-----------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------| -|`--genomeLoad` |mode of shared memory usage for the genome files. Only used with --runMode alignReads. - LoadAndKeep ... load genome into shared and keep it in memory after run - LoadAndRemove ... load genome into shared but remove it after run - LoadAndExit ... load genome into shared memory and exit, keeping the genome in memory for future runs - Remove ... do not map anything, just remove loaded genome from memory - NoSharedMemory ... do not use shared memory, each job will have its own private copy of the genome |`string`, example: `"NoSharedMemory"` | -|`--genomeFastaFiles` |path(s) to the fasta files with the genome sequences, separated by spaces. These files should be plain text FASTA files, they *cannot* be zipped. Required for the genome generation (--runMode genomeGenerate). Can also be used in the mapping (--runMode alignReads) to add extra (new) sequences to the genome (e.g. spike-ins). |`file` | -|`--genomeFileSizes` |genome files exact sizes in bytes. Typically, this should not be defined by the user. |`integer`, example: `0` | -|`--genomeTransformOutput` |which output to transform back to original genome - SAM ... SAM/BAM alignments - SJ ... splice junctions (SJ.out.tab) - None ... no transformation of the output |`string` | -|`--genomeChrSetMitochondrial` |names of the mitochondrial chromosomes. Presently only used for STARsolo statistics output/ |`string`, example: `"chrM"`, example: `"M"`, example: `"MT"` | +|Name |Description |Attributes | +|:-----------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------| +|`--genomeLoad` |mode of shared memory usage for the genome files. Only used with --runMode alignReads. - LoadAndKeep ... load genome into shared and keep it in memory after run - LoadAndRemove ... load genome into shared but remove it after run - LoadAndExit ... load genome into shared memory and exit, keeping the genome in memory for future runs - Remove ... do not map anything, just remove loaded genome from memory - NoSharedMemory ... do not use shared memory, each job will have its own private copy of the genome |`string`, example: `"NoSharedMemory"` | +|`--genomeFastaFiles` |path(s) to the fasta files with the genome sequences, separated by spaces. These files should be plain text FASTA files, they *cannot* be zipped. Required for the genome generation (--runMode genomeGenerate). Can also be used in the mapping (--runMode alignReads) to add extra (new) sequences to the genome (e.g. spike-ins). |List of `file`, multiple_sep: `";"` | +|`--genomeFileSizes` |genome files exact sizes in bytes. Typically, this should not be defined by the user. |List of `integer`, example: `0`, multiple_sep: `";"` | +|`--genomeTransformOutput` |which output to transform back to original genome - SAM ... SAM/BAM alignments - SJ ... splice junctions (SJ.out.tab) - None ... no transformation of the output |List of `string`, multiple_sep: `";"` | +|`--genomeChrSetMitochondrial` |names of the mitochondrial chromosomes. Presently only used for STARsolo statistics output/ |List of `string`, example: `"chrM", "M", "MT"`, multiple_sep: `";"` | ### Splice Junctions Database -|Name |Description |Attributes | -|:----------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------| -|`--sjdbFileChrStartEnd` |path to the files with genomic coordinates (chr start end strand) for the splice junction introns. Multiple files can be supplied and will be concatenated. |`string` | -|`--sjdbGTFfile` |path to the GTF file with annotations |`file` | -|`--sjdbGTFchrPrefix` |prefix for chromosome names in a GTF file (e.g. 'chr' for using ENSMEBL annotations with UCSC genomes) |`string` | -|`--sjdbGTFfeatureExon` |feature type in GTF file to be used as exons for building transcripts |`string`, example: `"exon"` | -|`--sjdbGTFtagExonParentTranscript` |GTF attribute name for parent transcript ID (default "transcript_id" works for GTF files) |`string`, example: `"transcript_id"` | -|`--sjdbGTFtagExonParentGene` |GTF attribute name for parent gene ID (default "gene_id" works for GTF files) |`string`, example: `"gene_id"` | -|`--sjdbGTFtagExonParentGeneName` |GTF attribute name for parent gene name |`string`, example: `"gene_name"` | -|`--sjdbGTFtagExonParentGeneType` |GTF attribute name for parent gene type |`string`, example: `"gene_type"`, example: `"gene_biotype"` | -|`--sjdbOverhang` |length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1) |`integer`, example: `100` | -|`--sjdbScore` |extra alignment score for alignments that cross database junctions |`integer`, example: `2` | -|`--sjdbInsertSave` |which files to save when sjdb junctions are inserted on the fly at the mapping step - Basic ... only small junction / transcript files - All ... all files including big Genome, SA and SAindex - this will create a complete genome directory |`string`, example: `"Basic"` | +|Name |Description |Attributes | +|:----------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------| +|`--sjdbFileChrStartEnd` |path to the files with genomic coordinates (chr start end strand) for the splice junction introns. Multiple files can be supplied and will be concatenated. |List of `string`, multiple_sep: `";"` | +|`--sjdbGTFfile` |path to the GTF file with annotations |`file` | +|`--sjdbGTFchrPrefix` |prefix for chromosome names in a GTF file (e.g. 'chr' for using ENSMEBL annotations with UCSC genomes) |`string` | +|`--sjdbGTFfeatureExon` |feature type in GTF file to be used as exons for building transcripts |`string`, example: `"exon"` | +|`--sjdbGTFtagExonParentTranscript` |GTF attribute name for parent transcript ID (default "transcript_id" works for GTF files) |`string`, example: `"transcript_id"` | +|`--sjdbGTFtagExonParentGene` |GTF attribute name for parent gene ID (default "gene_id" works for GTF files) |`string`, example: `"gene_id"` | +|`--sjdbGTFtagExonParentGeneName` |GTF attribute name for parent gene name |List of `string`, example: `"gene_name"`, multiple_sep: `";"` | +|`--sjdbGTFtagExonParentGeneType` |GTF attribute name for parent gene type |List of `string`, example: `"gene_type", "gene_biotype"`, multiple_sep: `";"` | +|`--sjdbOverhang` |length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1) |`integer`, example: `100` | +|`--sjdbScore` |extra alignment score for alignments that cross database junctions |`integer`, example: `2` | +|`--sjdbInsertSave` |which files to save when sjdb junctions are inserted on the fly at the mapping step - Basic ... only small junction / transcript files - All ... all files including big Genome, SA and SAindex - this will create a complete genome directory |`string`, example: `"Basic"` | ### Variation parameters @@ -118,43 +118,43 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Read Parameters -|Name |Description |Attributes | -|:------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------| -|`--readFilesType` |format of input read files - Fastx ... FASTA or FASTQ - SAM SE ... SAM or BAM single-end reads; for BAM use --readFilesCommand samtools view - SAM PE ... SAM or BAM paired-end reads; for BAM use --readFilesCommand samtools view |`string`, example: `"Fastx"` | -|`--readFilesSAMattrKeep` |for --readFilesType SAM SE/PE, which SAM tags to keep in the output BAM, e.g.: --readFilesSAMtagsKeep RG PL - All ... keep all tags - None ... do not keep any tags |`string`, example: `"All"` | -|`--readFilesManifest` |path to the "manifest" file with the names of read files. The manifest file should contain 3 tab-separated columns: paired-end reads: read1_file_name $tab$ read2_file_name $tab$ read_group_line. single-end reads: read1_file_name $tab$ - $tab$ read_group_line. Spaces, but not tabs are allowed in file names. If read_group_line does not start with ID:, it can only contain one ID field, and ID: will be added to it. If read_group_line starts with ID:, it can contain several fields separated by $tab$, and all fields will be be copied verbatim into SAM @RG header line. |`file` | -|`--readFilesPrefix` |prefix for the read files names, i.e. it will be added in front of the strings in --readFilesIn |`string` | -|`--readFilesCommand` |command line to execute for each of the input file. This command should generate FASTA or FASTQ text and send it to stdout For example: zcat - to uncompress .gz files, bzcat - to uncompress .bz2 files, etc. |`string` | -|`--readMapNumber` |number of reads to map from the beginning of the file -1: map all reads |`integer`, example: `-1` | -|`--readMatesLengthsIn` |Equal/NotEqual - lengths of names,sequences,qualities for both mates are the same / not the same. NotEqual is safe in all situations. |`string`, example: `"NotEqual"` | -|`--readNameSeparator` |character(s) separating the part of the read names that will be trimmed in output (read name after space is always trimmed) |`string`, example: `"/"` | -|`--readQualityScoreBase` |number to be subtracted from the ASCII code to get Phred quality score |`integer`, example: `33` | +|Name |Description |Attributes | +|:------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------| +|`--readFilesType` |format of input read files - Fastx ... FASTA or FASTQ - SAM SE ... SAM or BAM single-end reads; for BAM use --readFilesCommand samtools view - SAM PE ... SAM or BAM paired-end reads; for BAM use --readFilesCommand samtools view |`string`, example: `"Fastx"` | +|`--readFilesSAMattrKeep` |for --readFilesType SAM SE/PE, which SAM tags to keep in the output BAM, e.g.: --readFilesSAMtagsKeep RG PL - All ... keep all tags - None ... do not keep any tags |List of `string`, example: `"All"`, multiple_sep: `";"` | +|`--readFilesManifest` |path to the "manifest" file with the names of read files. The manifest file should contain 3 tab-separated columns: paired-end reads: read1_file_name $tab$ read2_file_name $tab$ read_group_line. single-end reads: read1_file_name $tab$ - $tab$ read_group_line. Spaces, but not tabs are allowed in file names. If read_group_line does not start with ID:, it can only contain one ID field, and ID: will be added to it. If read_group_line starts with ID:, it can contain several fields separated by $tab$, and all fields will be be copied verbatim into SAM @RG header line. |`file` | +|`--readFilesPrefix` |prefix for the read files names, i.e. it will be added in front of the strings in --readFilesIn |`string` | +|`--readFilesCommand` |command line to execute for each of the input file. This command should generate FASTA or FASTQ text and send it to stdout For example: zcat - to uncompress .gz files, bzcat - to uncompress .bz2 files, etc. |List of `string`, multiple_sep: `";"` | +|`--readMapNumber` |number of reads to map from the beginning of the file -1: map all reads |`integer`, example: `-1` | +|`--readMatesLengthsIn` |Equal/NotEqual - lengths of names,sequences,qualities for both mates are the same / not the same. NotEqual is safe in all situations. |`string`, example: `"NotEqual"` | +|`--readNameSeparator` |character(s) separating the part of the read names that will be trimmed in output (read name after space is always trimmed) |List of `string`, example: `"/"`, multiple_sep: `";"` | +|`--readQualityScoreBase` |number to be subtracted from the ASCII code to get Phred quality score |`integer`, example: `33` | ### Read Clipping -|Name |Description |Attributes | -|:----------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------| -|`--clipAdapterType` |adapter clipping type - Hamming ... adapter clipping based on Hamming distance, with the number of mismatches controlled by --clip5pAdapterMMp - CellRanger4 ... 5p and 3p adapter clipping similar to CellRanger4. Utilizes Opal package by Martin Sosic: https://github.com/Martinsos/opal - None ... no adapter clipping, all other clip* parameters are disregarded |`string`, example: `"Hamming"` | -|`--clip3pNbases` |number(s) of bases to clip from 3p of each mate. If one value is given, it will be assumed the same for both mates. |`integer`, example: `0` | -|`--clip3pAdapterSeq` |adapter sequences to clip from 3p of each mate. If one value is given, it will be assumed the same for both mates. - polyA ... polyA sequence with the length equal to read length |`string` | -|`--clip3pAdapterMMp` |max proportion of mismatches for 3p adapter clipping for each mate. If one value is given, it will be assumed the same for both mates. |`double`, example: `0.1` | -|`--clip3pAfterAdapterNbases` |number of bases to clip from 3p of each mate after the adapter clipping. If one value is given, it will be assumed the same for both mates. |`integer`, example: `0` | -|`--clip5pNbases` |number(s) of bases to clip from 5p of each mate. If one value is given, it will be assumed the same for both mates. |`integer`, example: `0` | +|Name |Description |Attributes | +|:----------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------| +|`--clipAdapterType` |adapter clipping type - Hamming ... adapter clipping based on Hamming distance, with the number of mismatches controlled by --clip5pAdapterMMp - CellRanger4 ... 5p and 3p adapter clipping similar to CellRanger4. Utilizes Opal package by Martin Sosic: https://github.com/Martinsos/opal - None ... no adapter clipping, all other clip* parameters are disregarded |`string`, example: `"Hamming"` | +|`--clip3pNbases` |number(s) of bases to clip from 3p of each mate. If one value is given, it will be assumed the same for both mates. |List of `integer`, example: `0`, multiple_sep: `";"` | +|`--clip3pAdapterSeq` |adapter sequences to clip from 3p of each mate. If one value is given, it will be assumed the same for both mates. - polyA ... polyA sequence with the length equal to read length |List of `string`, multiple_sep: `";"` | +|`--clip3pAdapterMMp` |max proportion of mismatches for 3p adapter clipping for each mate. If one value is given, it will be assumed the same for both mates. |List of `double`, example: `0.1`, multiple_sep: `";"` | +|`--clip3pAfterAdapterNbases` |number of bases to clip from 3p of each mate after the adapter clipping. If one value is given, it will be assumed the same for both mates. |List of `integer`, example: `0`, multiple_sep: `";"` | +|`--clip5pNbases` |number(s) of bases to clip from 5p of each mate. If one value is given, it will be assumed the same for both mates. |List of `integer`, example: `0`, multiple_sep: `";"` | ### Limits -|Name |Description |Attributes | -|:---------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------| -|`--limitGenomeGenerateRAM` |maximum available RAM (bytes) for genome generation |`long`, example: `NA` | -|`--limitIObufferSize` |max available buffers size (bytes) for input/output, per thread |`long`, example: `30000000`, example: `50000000` | -|`--limitOutSAMoneReadBytes` |max size of the SAM record (bytes) for one read. Recommended value: >(2*(LengthMate1+LengthMate2+100)*outFilterMultimapNmax |`long`, example: `100000` | -|`--limitOutSJoneRead` |max number of junctions for one read (including all multi-mappers) |`integer`, example: `1000` | -|`--limitOutSJcollapsed` |max number of collapsed junctions |`integer`, example: `1000000` | -|`--limitBAMsortRAM` |maximum available RAM (bytes) for sorting BAM. If =0, it will be set to the genome index size. 0 value can only be used with --genomeLoad NoSharedMemory option. |`long`, example: `0` | -|`--limitSjdbInsertNsj` |maximum number of junctions to be inserted to the genome on the fly at the mapping stage, including those from annotations and those detected in the 1st step of the 2-pass run |`integer`, example: `1000000` | -|`--limitNreadsSoft` |soft limit on the number of reads |`integer`, example: `-1` | +|Name |Description |Attributes | +|:---------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------| +|`--limitGenomeGenerateRAM` |maximum available RAM (bytes) for genome generation |`long`, example: `NA` | +|`--limitIObufferSize` |max available buffers size (bytes) for input/output, per thread |List of `long`, example: `30000000, 50000000`, multiple_sep: `";"` | +|`--limitOutSAMoneReadBytes` |max size of the SAM record (bytes) for one read. Recommended value: >(2*(LengthMate1+LengthMate2+100)*outFilterMultimapNmax |`long`, example: `100000` | +|`--limitOutSJoneRead` |max number of junctions for one read (including all multi-mappers) |`integer`, example: `1000` | +|`--limitOutSJcollapsed` |max number of collapsed junctions |`integer`, example: `1000000` | +|`--limitBAMsortRAM` |maximum available RAM (bytes) for sorting BAM. If =0, it will be set to the genome index size. 0 value can only be used with --genomeLoad NoSharedMemory option. |`long`, example: `0` | +|`--limitSjdbInsertNsj` |maximum number of junctions to be inserted to the genome on the fly at the mapping stage, including those from annotations and those detected in the 1st step of the 2-pass run |`integer`, example: `1000000` | +|`--limitNreadsSoft` |soft limit on the number of reads |`integer`, example: `-1` | ### Output: general @@ -170,30 +170,30 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Output: SAM and BAM -|Name |Description |Attributes | -|:---------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------| -|`--outSAMtype` |type of SAM/BAM output 1st word: - BAM ... output BAM without sorting - SAM ... output SAM without sorting - None ... no SAM/BAM output 2nd, 3rd: - Unsorted ... standard unsorted - SortedByCoordinate ... sorted by coordinate. This option will allocate extra memory for sorting which can be specified by --limitBAMsortRAM. |`string`, example: `"SAM"` | -|`--outSAMmode` |mode of SAM output - None ... no SAM output - Full ... full SAM output - NoQS ... full SAM but without quality scores |`string`, example: `"Full"` | -|`--outSAMstrandField` |Cufflinks-like strand field flag - None ... not used - intronMotif ... strand derived from the intron motif. This option changes the output alignments: reads with inconsistent and/or non-canonical introns are filtered out. |`string` | -|`--outSAMattributes` |a string of desired SAM attributes, in the order desired for the output SAM. Tags can be listed in any combination/order. ***Presets: - None ... no attributes - Standard ... NH HI AS nM - All ... NH HI AS nM NM MD jM jI MC ch ***Alignment: - NH ... number of loci the reads maps to: =1 for unique mappers, >1 for multimappers. Standard SAM tag. - HI ... multiple alignment index, starts with --outSAMattrIHstart (=1 by default). Standard SAM tag. - AS ... local alignment score, +1/-1 for matches/mismateches, score* penalties for indels and gaps. For PE reads, total score for two mates. Stadnard SAM tag. - nM ... number of mismatches. For PE reads, sum over two mates. - NM ... edit distance to the reference (number of mismatched + inserted + deleted bases) for each mate. Standard SAM tag. - MD ... string encoding mismatched and deleted reference bases (see standard SAM specifications). Standard SAM tag. - jM ... intron motifs for all junctions (i.e. N in CIGAR): 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT. If splice junctions database is used, and a junction is annotated, 20 is added to its motif value. - jI ... start and end of introns for all junctions (1-based). - XS ... alignment strand according to --outSAMstrandField. - MC ... mate's CIGAR string. Standard SAM tag. - ch ... marks all segment of all chimeric alingments for --chimOutType WithinBAM output. - cN ... number of bases clipped from the read ends: 5' and 3' ***Variation: - vA ... variant allele - vG ... genomic coordinate of the variant overlapped by the read. - vW ... 1 - alignment passes WASP filtering; 2,3,4,5,6,7 - alignment does not pass WASP filtering. Requires --waspOutputMode SAMtag. ***STARsolo: - CR CY UR UY ... sequences and quality scores of cell barcodes and UMIs for the solo* demultiplexing. - GX GN ... gene ID and gene name for unique-gene reads. - gx gn ... gene IDs and gene names for unique- and multi-gene reads. - CB UB ... error-corrected cell barcodes and UMIs for solo* demultiplexing. Requires --outSAMtype BAM SortedByCoordinate. - sM ... assessment of CB and UMI. - sS ... sequence of the entire barcode (CB,UMI,adapter). - sQ ... quality of the entire barcode. ***Unsupported/undocumented: - ha ... haplotype (1/2) when mapping to the diploid genome. Requires genome generated with --genomeTransformType Diploid . - rB ... alignment block read/genomic coordinates. - vR ... read coordinate of the variant. |`string`, example: `"Standard"` | -|`--outSAMattrIHstart` |start value for the IH attribute. 0 may be required by some downstream software, such as Cufflinks or StringTie. |`integer`, example: `1` | -|`--outSAMunmapped` |output of unmapped reads in the SAM format 1st word: - None ... no output - Within ... output unmapped reads within the main SAM file (i.e. Aligned.out.sam) 2nd word: - KeepPairs ... record unmapped mate for each alignment, and, in case of unsorted output, keep it adjacent to its mapped mate. Only affects multi-mapping reads. |`string` | -|`--outSAMorder` |type of sorting for the SAM output Paired: one mate after the other for all paired alignments PairedKeepInputOrder: one mate after the other for all paired alignments, the order is kept the same as in the input FASTQ files |`string`, example: `"Paired"` | -|`--outSAMprimaryFlag` |which alignments are considered primary - all others will be marked with 0x100 bit in the FLAG - OneBestScore ... only one alignment with the best score is primary - AllBestScore ... all alignments with the best score are primary |`string`, example: `"OneBestScore"` | -|`--outSAMreadID` |read ID record type - Standard ... first word (until space) from the FASTx read ID line, removing /1,/2 from the end - Number ... read number (index) in the FASTx file |`string`, example: `"Standard"` | -|`--outSAMmapqUnique` |0 to 255: the MAPQ value for unique mappers |`integer`, example: `255` | -|`--outSAMflagOR` |0 to 65535: sam FLAG will be bitwise OR'd with this value, i.e. FLAG=FLAG | outSAMflagOR. This is applied after all flags have been set by STAR, and after outSAMflagAND. Can be used to set specific bits that are not set otherwise. |`integer`, example: `0` | -|`--outSAMflagAND` |0 to 65535: sam FLAG will be bitwise AND'd with this value, i.e. FLAG=FLAG & outSAMflagOR. This is applied after all flags have been set by STAR, but before outSAMflagOR. Can be used to unset specific bits that are not set otherwise. |`integer`, example: `65535` | -|`--outSAMattrRGline` |SAM/BAM read group line. The first word contains the read group identifier and must start with "ID:", e.g. --outSAMattrRGline ID:xxx CN:yy "DS:z z z". xxx will be added as RG tag to each output alignment. Any spaces in the tag values have to be double quoted. Comma separated RG lines correspons to different (comma separated) input files in --readFilesIn. Commas have to be surrounded by spaces, e.g. --outSAMattrRGline ID:xxx , ID:zzz "DS:z z" , ID:yyy DS:yyyy |`string` | -|`--outSAMheaderHD` |@HD (header) line of the SAM header |`string` | -|`--outSAMheaderPG` |extra @PG (software) line of the SAM header (in addition to STAR) |`string` | -|`--outSAMheaderCommentFile` |path to the file with @CO (comment) lines of the SAM header |`string` | -|`--outSAMfilter` |filter the output into main SAM/BAM files - KeepOnlyAddedReferences ... only keep the reads for which all alignments are to the extra reference sequences added with --genomeFastaFiles at the mapping stage. - KeepAllAddedReferences ... keep all alignments to the extra reference sequences added with --genomeFastaFiles at the mapping stage. |`string` | -|`--outSAMmultNmax` |max number of multiple alignments for a read that will be output to the SAM/BAM files. Note that if this value is not equal to -1, the top scoring alignment will be output first - -1 ... all alignments (up to --outFilterMultimapNmax) will be output |`integer`, example: `-1` | -|`--outSAMtlen` |calculation method for the TLEN field in the SAM/BAM files - 1 ... leftmost base of the (+)strand mate to rightmost base of the (-)mate. (+)sign for the (+)strand mate - 2 ... leftmost base of any mate to rightmost base of any mate. (+)sign for the mate with the leftmost base. This is different from 1 for overlapping mates with protruding ends |`integer`, example: `1` | -|`--outBAMcompression` |-1 to 10 BAM compression level, -1=default compression (6?), 0=no compression, 10=maximum compression |`integer`, example: `1` | -|`--outBAMsortingThreadN` |>=0: number of threads for BAM sorting. 0 will default to min(6,--runThreadN). |`integer`, example: `0` | -|`--outBAMsortingBinsN` |>0: number of genome bins for coordinate-sorting |`integer`, example: `50` | +|Name |Description |Attributes | +|:---------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------| +|`--outSAMtype` |type of SAM/BAM output 1st word: - BAM ... output BAM without sorting - SAM ... output SAM without sorting - None ... no SAM/BAM output 2nd, 3rd: - Unsorted ... standard unsorted - SortedByCoordinate ... sorted by coordinate. This option will allocate extra memory for sorting which can be specified by --limitBAMsortRAM. |List of `string`, example: `"SAM"`, multiple_sep: `";"` | +|`--outSAMmode` |mode of SAM output - None ... no SAM output - Full ... full SAM output - NoQS ... full SAM but without quality scores |`string`, example: `"Full"` | +|`--outSAMstrandField` |Cufflinks-like strand field flag - None ... not used - intronMotif ... strand derived from the intron motif. This option changes the output alignments: reads with inconsistent and/or non-canonical introns are filtered out. |`string` | +|`--outSAMattributes` |a string of desired SAM attributes, in the order desired for the output SAM. Tags can be listed in any combination/order. ***Presets: - None ... no attributes - Standard ... NH HI AS nM - All ... NH HI AS nM NM MD jM jI MC ch ***Alignment: - NH ... number of loci the reads maps to: =1 for unique mappers, >1 for multimappers. Standard SAM tag. - HI ... multiple alignment index, starts with --outSAMattrIHstart (=1 by default). Standard SAM tag. - AS ... local alignment score, +1/-1 for matches/mismateches, score* penalties for indels and gaps. For PE reads, total score for two mates. Stadnard SAM tag. - nM ... number of mismatches. For PE reads, sum over two mates. - NM ... edit distance to the reference (number of mismatched + inserted + deleted bases) for each mate. Standard SAM tag. - MD ... string encoding mismatched and deleted reference bases (see standard SAM specifications). Standard SAM tag. - jM ... intron motifs for all junctions (i.e. N in CIGAR): 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT. If splice junctions database is used, and a junction is annotated, 20 is added to its motif value. - jI ... start and end of introns for all junctions (1-based). - XS ... alignment strand according to --outSAMstrandField. - MC ... mate's CIGAR string. Standard SAM tag. - ch ... marks all segment of all chimeric alingments for --chimOutType WithinBAM output. - cN ... number of bases clipped from the read ends: 5' and 3' ***Variation: - vA ... variant allele - vG ... genomic coordinate of the variant overlapped by the read. - vW ... 1 - alignment passes WASP filtering; 2,3,4,5,6,7 - alignment does not pass WASP filtering. Requires --waspOutputMode SAMtag. ***STARsolo: - CR CY UR UY ... sequences and quality scores of cell barcodes and UMIs for the solo* demultiplexing. - GX GN ... gene ID and gene name for unique-gene reads. - gx gn ... gene IDs and gene names for unique- and multi-gene reads. - CB UB ... error-corrected cell barcodes and UMIs for solo* demultiplexing. Requires --outSAMtype BAM SortedByCoordinate. - sM ... assessment of CB and UMI. - sS ... sequence of the entire barcode (CB,UMI,adapter). - sQ ... quality of the entire barcode. ***Unsupported/undocumented: - ha ... haplotype (1/2) when mapping to the diploid genome. Requires genome generated with --genomeTransformType Diploid . - rB ... alignment block read/genomic coordinates. - vR ... read coordinate of the variant. |List of `string`, example: `"Standard"`, multiple_sep: `";"` | +|`--outSAMattrIHstart` |start value for the IH attribute. 0 may be required by some downstream software, such as Cufflinks or StringTie. |`integer`, example: `1` | +|`--outSAMunmapped` |output of unmapped reads in the SAM format 1st word: - None ... no output - Within ... output unmapped reads within the main SAM file (i.e. Aligned.out.sam) 2nd word: - KeepPairs ... record unmapped mate for each alignment, and, in case of unsorted output, keep it adjacent to its mapped mate. Only affects multi-mapping reads. |List of `string`, multiple_sep: `";"` | +|`--outSAMorder` |type of sorting for the SAM output Paired: one mate after the other for all paired alignments PairedKeepInputOrder: one mate after the other for all paired alignments, the order is kept the same as in the input FASTQ files |`string`, example: `"Paired"` | +|`--outSAMprimaryFlag` |which alignments are considered primary - all others will be marked with 0x100 bit in the FLAG - OneBestScore ... only one alignment with the best score is primary - AllBestScore ... all alignments with the best score are primary |`string`, example: `"OneBestScore"` | +|`--outSAMreadID` |read ID record type - Standard ... first word (until space) from the FASTx read ID line, removing /1,/2 from the end - Number ... read number (index) in the FASTx file |`string`, example: `"Standard"` | +|`--outSAMmapqUnique` |0 to 255: the MAPQ value for unique mappers |`integer`, example: `255` | +|`--outSAMflagOR` |0 to 65535: sam FLAG will be bitwise OR'd with this value, i.e. FLAG=FLAG | outSAMflagOR. This is applied after all flags have been set by STAR, and after outSAMflagAND. Can be used to set specific bits that are not set otherwise. |`integer`, example: `0` | +|`--outSAMflagAND` |0 to 65535: sam FLAG will be bitwise AND'd with this value, i.e. FLAG=FLAG & outSAMflagOR. This is applied after all flags have been set by STAR, but before outSAMflagOR. Can be used to unset specific bits that are not set otherwise. |`integer`, example: `65535` | +|`--outSAMattrRGline` |SAM/BAM read group line. The first word contains the read group identifier and must start with "ID:", e.g. --outSAMattrRGline ID:xxx CN:yy "DS:z z z". xxx will be added as RG tag to each output alignment. Any spaces in the tag values have to be double quoted. Comma separated RG lines correspons to different (comma separated) input files in --readFilesIn. Commas have to be surrounded by spaces, e.g. --outSAMattrRGline ID:xxx , ID:zzz "DS:z z" , ID:yyy DS:yyyy |List of `string`, multiple_sep: `";"` | +|`--outSAMheaderHD` |@HD (header) line of the SAM header |List of `string`, multiple_sep: `";"` | +|`--outSAMheaderPG` |extra @PG (software) line of the SAM header (in addition to STAR) |List of `string`, multiple_sep: `";"` | +|`--outSAMheaderCommentFile` |path to the file with @CO (comment) lines of the SAM header |`string` | +|`--outSAMfilter` |filter the output into main SAM/BAM files - KeepOnlyAddedReferences ... only keep the reads for which all alignments are to the extra reference sequences added with --genomeFastaFiles at the mapping stage. - KeepAllAddedReferences ... keep all alignments to the extra reference sequences added with --genomeFastaFiles at the mapping stage. |List of `string`, multiple_sep: `";"` | +|`--outSAMmultNmax` |max number of multiple alignments for a read that will be output to the SAM/BAM files. Note that if this value is not equal to -1, the top scoring alignment will be output first - -1 ... all alignments (up to --outFilterMultimapNmax) will be output |`integer`, example: `-1` | +|`--outSAMtlen` |calculation method for the TLEN field in the SAM/BAM files - 1 ... leftmost base of the (+)strand mate to rightmost base of the (-)mate. (+)sign for the (+)strand mate - 2 ... leftmost base of any mate to rightmost base of any mate. (+)sign for the mate with the leftmost base. This is different from 1 for overlapping mates with protruding ends |`integer`, example: `1` | +|`--outBAMcompression` |-1 to 10 BAM compression level, -1=default compression (6?), 0=no compression, 10=maximum compression |`integer`, example: `1` | +|`--outBAMsortingThreadN` |>=0: number of threads for BAM sorting. 0 will default to min(6,--runThreadN). |`integer`, example: `0` | +|`--outBAMsortingBinsN` |>0: number of genome bins for coordinate-sorting |`integer`, example: `50` | ### BAM processing @@ -206,12 +206,12 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Output Wiggle -|Name |Description |Attributes | -|:--------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------| -|`--outWigType` |type of signal output, e.g. "bedGraph" OR "bedGraph read1_5p". Requires sorted BAM: --outSAMtype BAM SortedByCoordinate . 1st word: - None ... no signal output - bedGraph ... bedGraph format - wiggle ... wiggle format 2nd word: - read1_5p ... signal from only 5' of the 1st read, useful for CAGE/RAMPAGE etc - read2 ... signal from only 2nd read |`string` | -|`--outWigStrand` |strandedness of wiggle/bedGraph output - Stranded ... separate strands, str1 and str2 - Unstranded ... collapsed strands |`string`, example: `"Stranded"` | -|`--outWigReferencesPrefix` |prefix matching reference names to include in the output wiggle file, e.g. "chr", default "-" - include all references |`string` | -|`--outWigNorm` |type of normalization for the signal - RPM ... reads per million of mapped reads - None ... no normalization, "raw" counts |`string`, example: `"RPM"` | +|Name |Description |Attributes | +|:--------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------| +|`--outWigType` |type of signal output, e.g. "bedGraph" OR "bedGraph read1_5p". Requires sorted BAM: --outSAMtype BAM SortedByCoordinate . 1st word: - None ... no signal output - bedGraph ... bedGraph format - wiggle ... wiggle format 2nd word: - read1_5p ... signal from only 5' of the 1st read, useful for CAGE/RAMPAGE etc - read2 ... signal from only 2nd read |List of `string`, multiple_sep: `";"` | +|`--outWigStrand` |strandedness of wiggle/bedGraph output - Stranded ... separate strands, str1 and str2 - Unstranded ... collapsed strands |`string`, example: `"Stranded"` | +|`--outWigReferencesPrefix` |prefix matching reference names to include in the output wiggle file, e.g. "chr", default "-" - include all references |`string` | +|`--outWigNorm` |type of normalization for the signal - RPM ... reads per million of mapped reads - None ... no normalization, "raw" counts |`string`, example: `"RPM"` | ### Output Filtering @@ -241,14 +241,14 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Output Filtering: Splice Junctions -|Name |Description |Attributes | -|:-------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------| -|`--outSJfilterReads` |which reads to consider for collapsed splice junctions output - All ... all reads, unique- and multi-mappers - Unique ... uniquely mapping reads only |`string`, example: `"All"` | -|`--outSJfilterOverhangMin` |minimum overhang length for splice junctions on both sides for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif does not apply to annotated junctions |`integer`, example: `30`, example: `12`, example: `12`, example: `12` | -|`--outSJfilterCountUniqueMin` |minimum uniquely mapping read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied does not apply to annotated junctions |`integer`, example: `3`, example: `1`, example: `1`, example: `1` | -|`--outSJfilterCountTotalMin` |minimum total (multi-mapping+unique) read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied does not apply to annotated junctions |`integer`, example: `3`, example: `1`, example: `1`, example: `1` | -|`--outSJfilterDistToOtherSJmin` |minimum allowed distance to other junctions' donor/acceptor does not apply to annotated junctions |`integer`, example: `10`, example: `0`, example: `5`, example: `10` | -|`--outSJfilterIntronMaxVsReadN` |maximum gap allowed for junctions supported by 1,2,3,,,N reads i.e. by default junctions supported by 1 read can have gaps <=50000b, by 2 reads: <=100000b, by 3 reads: <=200000. by >=4 reads any gap <=alignIntronMax does not apply to annotated junctions |`integer`, example: `50000`, example: `100000`, example: `200000` | +|Name |Description |Attributes | +|:-------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------| +|`--outSJfilterReads` |which reads to consider for collapsed splice junctions output - All ... all reads, unique- and multi-mappers - Unique ... uniquely mapping reads only |`string`, example: `"All"` | +|`--outSJfilterOverhangMin` |minimum overhang length for splice junctions on both sides for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif does not apply to annotated junctions |List of `integer`, example: `30, 12, 12, 12`, multiple_sep: `";"` | +|`--outSJfilterCountUniqueMin` |minimum uniquely mapping read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied does not apply to annotated junctions |List of `integer`, example: `3, 1, 1, 1`, multiple_sep: `";"` | +|`--outSJfilterCountTotalMin` |minimum total (multi-mapping+unique) read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied does not apply to annotated junctions |List of `integer`, example: `3, 1, 1, 1`, multiple_sep: `";"` | +|`--outSJfilterDistToOtherSJmin` |minimum allowed distance to other junctions' donor/acceptor does not apply to annotated junctions |List of `integer`, example: `10, 0, 5, 10`, multiple_sep: `";"` | +|`--outSJfilterIntronMaxVsReadN` |maximum gap allowed for junctions supported by 1,2,3,,,N reads i.e. by default junctions supported by 1 read can have gaps <=50000b, by 2 reads: <=100000b, by 3 reads: <=200000. by >=4 reads any gap <=alignIntronMax does not apply to annotated junctions |List of `integer`, example: `50000, 100000, 200000`, multiple_sep: `";"` | ### Scoring @@ -269,32 +269,32 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Alignments and Seeding -|Name |Description |Attributes | -|:------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------| -|`--seedSearchStartLmax` |defines the search start point through the read - the read is split into pieces no longer than this value |`integer`, example: `50` | -|`--seedSearchStartLmaxOverLread` |seedSearchStartLmax normalized to read length (sum of mates' lengths for paired-end reads) |`double`, example: `1` | -|`--seedSearchLmax` |defines the maximum length of the seeds, if =0 seed length is not limited |`integer`, example: `0` | -|`--seedMultimapNmax` |only pieces that map fewer than this value are utilized in the stitching procedure |`integer`, example: `10000` | -|`--seedPerReadNmax` |max number of seeds per read |`integer`, example: `1000` | -|`--seedPerWindowNmax` |max number of seeds per window |`integer`, example: `50` | -|`--seedNoneLociPerWindow` |max number of one seed loci per window |`integer`, example: `10` | -|`--seedSplitMin` |min length of the seed sequences split by Ns or mate gap |`integer`, example: `12` | -|`--seedMapMin` |min length of seeds to be mapped |`integer`, example: `5` | -|`--alignIntronMin` |minimum intron size, genomic gap is considered intron if its length>=alignIntronMin, otherwise it is considered Deletion |`integer`, example: `21` | -|`--alignIntronMax` |maximum intron size, if 0, max intron size will be determined by (2^winBinNbits)*winAnchorDistNbins |`integer`, example: `0` | -|`--alignMatesGapMax` |maximum gap between two mates, if 0, max intron gap will be determined by (2^winBinNbits)*winAnchorDistNbins |`integer`, example: `0` | -|`--alignSJoverhangMin` |minimum overhang (i.e. block size) for spliced alignments |`integer`, example: `5` | -|`--alignSJstitchMismatchNmax` |maximum number of mismatches for stitching of the splice junctions (-1: no limit). (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. |`integer`, example: `0`, example: `-1`, example: `0`, example: `0` | -|`--alignSJDBoverhangMin` |minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments |`integer`, example: `3` | -|`--alignSplicedMateMapLmin` |minimum mapped length for a read mate that is spliced |`integer`, example: `0` | -|`--alignSplicedMateMapLminOverLmate` |alignSplicedMateMapLmin normalized to mate length |`double`, example: `0.66` | -|`--alignWindowsPerReadNmax` |max number of windows per read |`integer`, example: `10000` | -|`--alignTranscriptsPerWindowNmax` |max number of transcripts per window |`integer`, example: `100` | -|`--alignTranscriptsPerReadNmax` |max number of different alignments per read to consider |`integer`, example: `10000` | -|`--alignEndsType` |type of read ends alignment - Local ... standard local alignment with soft-clipping allowed - EndToEnd ... force end-to-end read alignment, do not soft-clip - Extend5pOfRead1 ... fully extend only the 5p of the read1, all other ends: local alignment - Extend5pOfReads12 ... fully extend only the 5p of the both read1 and read2, all other ends: local alignment |`string`, example: `"Local"` | -|`--alignEndsProtrude` |allow protrusion of alignment ends, i.e. start (end) of the +strand mate downstream of the start (end) of the -strand mate 1st word: int: maximum number of protrusion bases allowed 2nd word: string: - ConcordantPair ... report alignments with non-zero protrusion as concordant pairs - DiscordantPair ... report alignments with non-zero protrusion as discordant pairs |`string`, example: `"0 ConcordantPair"` | -|`--alignSoftClipAtReferenceEnds` |allow the soft-clipping of the alignments past the end of the chromosomes - Yes ... allow - No ... prohibit, useful for compatibility with Cufflinks |`string`, example: `"Yes"` | -|`--alignInsertionFlush` |how to flush ambiguous insertion positions - None ... insertions are not flushed - Right ... insertions are flushed to the right |`string` | +|Name |Description |Attributes | +|:------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------| +|`--seedSearchStartLmax` |defines the search start point through the read - the read is split into pieces no longer than this value |`integer`, example: `50` | +|`--seedSearchStartLmaxOverLread` |seedSearchStartLmax normalized to read length (sum of mates' lengths for paired-end reads) |`double`, example: `1` | +|`--seedSearchLmax` |defines the maximum length of the seeds, if =0 seed length is not limited |`integer`, example: `0` | +|`--seedMultimapNmax` |only pieces that map fewer than this value are utilized in the stitching procedure |`integer`, example: `10000` | +|`--seedPerReadNmax` |max number of seeds per read |`integer`, example: `1000` | +|`--seedPerWindowNmax` |max number of seeds per window |`integer`, example: `50` | +|`--seedNoneLociPerWindow` |max number of one seed loci per window |`integer`, example: `10` | +|`--seedSplitMin` |min length of the seed sequences split by Ns or mate gap |`integer`, example: `12` | +|`--seedMapMin` |min length of seeds to be mapped |`integer`, example: `5` | +|`--alignIntronMin` |minimum intron size, genomic gap is considered intron if its length>=alignIntronMin, otherwise it is considered Deletion |`integer`, example: `21` | +|`--alignIntronMax` |maximum intron size, if 0, max intron size will be determined by (2^winBinNbits)*winAnchorDistNbins |`integer`, example: `0` | +|`--alignMatesGapMax` |maximum gap between two mates, if 0, max intron gap will be determined by (2^winBinNbits)*winAnchorDistNbins |`integer`, example: `0` | +|`--alignSJoverhangMin` |minimum overhang (i.e. block size) for spliced alignments |`integer`, example: `5` | +|`--alignSJstitchMismatchNmax` |maximum number of mismatches for stitching of the splice junctions (-1: no limit). (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. |List of `integer`, example: `0, -1, 0, 0`, multiple_sep: `";"` | +|`--alignSJDBoverhangMin` |minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments |`integer`, example: `3` | +|`--alignSplicedMateMapLmin` |minimum mapped length for a read mate that is spliced |`integer`, example: `0` | +|`--alignSplicedMateMapLminOverLmate` |alignSplicedMateMapLmin normalized to mate length |`double`, example: `0.66` | +|`--alignWindowsPerReadNmax` |max number of windows per read |`integer`, example: `10000` | +|`--alignTranscriptsPerWindowNmax` |max number of transcripts per window |`integer`, example: `100` | +|`--alignTranscriptsPerReadNmax` |max number of different alignments per read to consider |`integer`, example: `10000` | +|`--alignEndsType` |type of read ends alignment - Local ... standard local alignment with soft-clipping allowed - EndToEnd ... force end-to-end read alignment, do not soft-clip - Extend5pOfRead1 ... fully extend only the 5p of the read1, all other ends: local alignment - Extend5pOfReads12 ... fully extend only the 5p of the both read1 and read2, all other ends: local alignment |`string`, example: `"Local"` | +|`--alignEndsProtrude` |allow protrusion of alignment ends, i.e. start (end) of the +strand mate downstream of the start (end) of the -strand mate 1st word: int: maximum number of protrusion bases allowed 2nd word: string: - ConcordantPair ... report alignments with non-zero protrusion as concordant pairs - DiscordantPair ... report alignments with non-zero protrusion as discordant pairs |`string`, example: `"0 ConcordantPair"` | +|`--alignSoftClipAtReferenceEnds` |allow the soft-clipping of the alignments past the end of the chromosomes - Yes ... allow - No ... prohibit, useful for compatibility with Cufflinks |`string`, example: `"Yes"` | +|`--alignInsertionFlush` |how to flush ambiguous insertion positions - None ... insertions are not flushed - Right ... insertions are flushed to the right |`string` | ### Paired-End reads @@ -319,29 +319,29 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Chimeric Alignments -|Name |Description |Attributes | -|:----------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------| -|`--chimOutType` |type of chimeric output - Junctions ... Chimeric.out.junction - SeparateSAMold ... output old SAM into separate Chimeric.out.sam file - WithinBAM ... output into main aligned BAM files (Aligned.*.bam) - WithinBAM HardClip ... (default) hard-clipping in the CIGAR for supplemental chimeric alignments (default if no 2nd word is present) - WithinBAM SoftClip ... soft-clipping in the CIGAR for supplemental chimeric alignments |`string`, example: `"Junctions"` | -|`--chimSegmentMin` |minimum length of chimeric segment length, if ==0, no chimeric output |`integer`, example: `0` | -|`--chimScoreMin` |minimum total (summed) score of the chimeric segments |`integer`, example: `0` | -|`--chimScoreDropMax` |max drop (difference) of chimeric score (the sum of scores of all chimeric segments) from the read length |`integer`, example: `20` | -|`--chimScoreSeparation` |minimum difference (separation) between the best chimeric score and the next one |`integer`, example: `10` | -|`--chimScoreJunctionNonGTAG` |penalty for a non-GT/AG chimeric junction |`integer`, example: `-1` | -|`--chimJunctionOverhangMin` |minimum overhang for a chimeric junction |`integer`, example: `20` | -|`--chimSegmentReadGapMax` |maximum gap in the read sequence between chimeric segments |`integer`, example: `0` | -|`--chimFilter` |different filters for chimeric alignments - None ... no filtering - banGenomicN ... Ns are not allowed in the genome sequence around the chimeric junction |`string`, example: `"banGenomicN"` | -|`--chimMainSegmentMultNmax` |maximum number of multi-alignments for the main chimeric segment. =1 will prohibit multimapping main segments. |`integer`, example: `10` | -|`--chimMultimapNmax` |maximum number of chimeric multi-alignments - 0 ... use the old scheme for chimeric detection which only considered unique alignments |`integer`, example: `0` | -|`--chimMultimapScoreRange` |the score range for multi-mapping chimeras below the best chimeric score. Only works with --chimMultimapNmax > 1 |`integer`, example: `1` | -|`--chimNonchimScoreDropMin` |to trigger chimeric detection, the drop in the best non-chimeric alignment score with respect to the read length has to be greater than this value |`integer`, example: `20` | -|`--chimOutJunctionFormat` |formatting type for the Chimeric.out.junction file - 0 ... no comment lines/headers - 1 ... comment lines at the end of the file: command line and Nreads: total, unique/multi-mapping |`integer`, example: `0` | +|Name |Description |Attributes | +|:----------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------| +|`--chimOutType` |type of chimeric output - Junctions ... Chimeric.out.junction - SeparateSAMold ... output old SAM into separate Chimeric.out.sam file - WithinBAM ... output into main aligned BAM files (Aligned.*.bam) - WithinBAM HardClip ... (default) hard-clipping in the CIGAR for supplemental chimeric alignments (default if no 2nd word is present) - WithinBAM SoftClip ... soft-clipping in the CIGAR for supplemental chimeric alignments |List of `string`, example: `"Junctions"`, multiple_sep: `";"` | +|`--chimSegmentMin` |minimum length of chimeric segment length, if ==0, no chimeric output |`integer`, example: `0` | +|`--chimScoreMin` |minimum total (summed) score of the chimeric segments |`integer`, example: `0` | +|`--chimScoreDropMax` |max drop (difference) of chimeric score (the sum of scores of all chimeric segments) from the read length |`integer`, example: `20` | +|`--chimScoreSeparation` |minimum difference (separation) between the best chimeric score and the next one |`integer`, example: `10` | +|`--chimScoreJunctionNonGTAG` |penalty for a non-GT/AG chimeric junction |`integer`, example: `-1` | +|`--chimJunctionOverhangMin` |minimum overhang for a chimeric junction |`integer`, example: `20` | +|`--chimSegmentReadGapMax` |maximum gap in the read sequence between chimeric segments |`integer`, example: `0` | +|`--chimFilter` |different filters for chimeric alignments - None ... no filtering - banGenomicN ... Ns are not allowed in the genome sequence around the chimeric junction |List of `string`, example: `"banGenomicN"`, multiple_sep: `";"` | +|`--chimMainSegmentMultNmax` |maximum number of multi-alignments for the main chimeric segment. =1 will prohibit multimapping main segments. |`integer`, example: `10` | +|`--chimMultimapNmax` |maximum number of chimeric multi-alignments - 0 ... use the old scheme for chimeric detection which only considered unique alignments |`integer`, example: `0` | +|`--chimMultimapScoreRange` |the score range for multi-mapping chimeras below the best chimeric score. Only works with --chimMultimapNmax > 1 |`integer`, example: `1` | +|`--chimNonchimScoreDropMin` |to trigger chimeric detection, the drop in the best non-chimeric alignment score with respect to the read length has to be greater than this value |`integer`, example: `20` | +|`--chimOutJunctionFormat` |formatting type for the Chimeric.out.junction file - 0 ... no comment lines/headers - 1 ... comment lines at the end of the file: command line and Nreads: total, unique/multi-mapping |`integer`, example: `0` | ### Quantification of Annotations |Name |Description |Attributes | |:------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------| -|`--quantMode` |types of quantification requested - - ... none - TranscriptomeSAM ... output SAM/BAM alignments to transcriptome into a separate file - GeneCounts ... count reads per gene |`string` | +|`--quantMode` |types of quantification requested - - ... none - TranscriptomeSAM ... output SAM/BAM alignments to transcriptome into a separate file - GeneCounts ... count reads per gene |List of `string`, multiple_sep: `";"` | |`--quantTranscriptomeBAMcompression` |-2 to 10 transcriptome BAM compression level - -2 ... no BAM output - -1 ... default compression (6?) - 0 ... no compression - 10 ... maximum compression |`integer`, example: `1` | |`--quantTranscriptomeBan` |prohibit various alignment type - IndelSoftclipSingleend ... prohibit indels, soft clipping and single-end alignments - compatible with RSEM - Singleend ... prohibit single-end alignments |`string`, example: `"IndelSoftclipSingleend"` | @@ -363,32 +363,32 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### STARsolo (single cell RNA-seq) parameters -|Name |Description |Attributes | -|:-----------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------| -|`--soloType` |type of single-cell RNA-seq - CB_UMI_Simple ... (a.k.a. Droplet) one UMI and one Cell Barcode of fixed length in read2, e.g. Drop-seq and 10X Chromium. - CB_UMI_Complex ... multiple Cell Barcodes of varying length, one UMI of fixed length and one adapter sequence of fixed length are allowed in read2 only (e.g. inDrop, ddSeq). - CB_samTagOut ... output Cell Barcode as CR and/or CB SAm tag. No UMI counting. --readFilesIn cDNA_read1 [cDNA_read2 if paired-end] CellBarcode_read . Requires --outSAMtype BAM Unsorted [and/or SortedByCoordinate] - SmartSeq ... Smart-seq: each cell in a separate FASTQ (paired- or single-end), barcodes are corresponding read-groups, no UMI sequences, alignments deduplicated according to alignment start and end (after extending soft-clipped bases) |`string` | -|`--soloCBwhitelist` |file(s) with whitelist(s) of cell barcodes. Only --soloType CB_UMI_Complex allows more than one whitelist file. - None ... no whitelist: all cell barcodes are allowed |`string` | -|`--soloCBstart` |cell barcode start base |`integer`, example: `1` | -|`--soloCBlen` |cell barcode length |`integer`, example: `16` | -|`--soloUMIstart` |UMI start base |`integer`, example: `17` | -|`--soloUMIlen` |UMI length |`integer`, example: `10` | -|`--soloBarcodeReadLength` |length of the barcode read - 1 ... equal to sum of soloCBlen+soloUMIlen - 0 ... not defined, do not check |`integer`, example: `1` | -|`--soloBarcodeMate` |identifies which read mate contains the barcode (CB+UMI) sequence - 0 ... barcode sequence is on separate read, which should always be the last file in the --readFilesIn listed - 1 ... barcode sequence is a part of mate 1 - 2 ... barcode sequence is a part of mate 2 |`integer`, example: `0` | -|`--soloCBposition` |position of Cell Barcode(s) on the barcode read. Presently only works with --soloType CB_UMI_Complex, and barcodes are assumed to be on Read2. Format for each barcode: startAnchor_startPosition_endAnchor_endPosition start(end)Anchor defines the Anchor Base for the CB: 0: read start; 1: read end; 2: adapter start; 3: adapter end start(end)Position is the 0-based position with of the CB start(end) with respect to the Anchor Base String for different barcodes are separated by space. Example: inDrop (Zilionis et al, Nat. Protocols, 2017): --soloCBposition 0_0_2_-1 3_1_3_8 |`string` | -|`--soloUMIposition` |position of the UMI on the barcode read, same as soloCBposition Example: inDrop (Zilionis et al, Nat. Protocols, 2017): --soloCBposition 3_9_3_14 |`string` | -|`--soloAdapterSequence` |adapter sequence to anchor barcodes. Only one adapter sequence is allowed. |`string` | -|`--soloAdapterMismatchesNmax` |maximum number of mismatches allowed in adapter sequence. |`integer`, example: `1` | -|`--soloCBmatchWLtype` |matching the Cell Barcodes to the WhiteList - Exact ... only exact matches allowed - 1MM ... only one match in whitelist with 1 mismatched base allowed. Allowed CBs have to have at least one read with exact match. - 1MM_multi ... multiple matches in whitelist with 1 mismatched base allowed, posterior probability calculation is used choose one of the matches. Allowed CBs have to have at least one read with exact match. This option matches best with CellRanger 2.2.0 - 1MM_multi_pseudocounts ... same as 1MM_Multi, but pseudocounts of 1 are added to all whitelist barcodes. - 1MM_multi_Nbase_pseudocounts ... same as 1MM_multi_pseudocounts, multimatching to WL is allowed for CBs with N-bases. This option matches best with CellRanger >= 3.0.0 - EditDist_2 ... allow up to edit distance of 3 fpr each of the barcodes. May include one deletion + one insertion. Only works with --soloType CB_UMI_Complex. Matches to multiple passlist barcdoes are not allowed. Similar to ParseBio Split-seq pipeline. |`string`, example: `"1MM_multi"` | -|`--soloInputSAMattrBarcodeSeq` |when inputting reads from a SAM file (--readsFileType SAM SE/PE), these SAM attributes mark the barcode sequence (in proper order). For instance, for 10X CellRanger or STARsolo BAMs, use --soloInputSAMattrBarcodeSeq CR UR . This parameter is required when running STARsolo with input from SAM. |`string` | -|`--soloInputSAMattrBarcodeQual` |when inputting reads from a SAM file (--readsFileType SAM SE/PE), these SAM attributes mark the barcode qualities (in proper order). For instance, for 10X CellRanger or STARsolo BAMs, use --soloInputSAMattrBarcodeQual CY UY . If this parameter is '-' (default), the quality 'H' will be assigned to all bases. |`string` | -|`--soloStrand` |strandedness of the solo libraries: - Unstranded ... no strand information - Forward ... read strand same as the original RNA molecule - Reverse ... read strand opposite to the original RNA molecule |`string`, example: `"Forward"` | -|`--soloFeatures` |genomic features for which the UMI counts per Cell Barcode are collected - Gene ... genes: reads match the gene transcript - SJ ... splice junctions: reported in SJ.out.tab - GeneFull ... full gene (pre-mRNA): count all reads overlapping genes' exons and introns - GeneFull_ExonOverIntron ... full gene (pre-mRNA): count all reads overlapping genes' exons and introns: prioritize 100% overlap with exons - GeneFull_Ex50pAS ... full gene (pre-RNA): count all reads overlapping genes' exons and introns: prioritize >50% overlap with exons. Do not count reads with 100% exonic overlap in the antisense direction. |`string`, example: `"Gene"` | -|`--soloMultiMappers` |counting method for reads mapping to multiple genes - Unique ... count only reads that map to unique genes - Uniform ... uniformly distribute multi-genic UMIs to all genes - Rescue ... distribute UMIs proportionally to unique+uniform counts (~ first iteration of EM) - PropUnique ... distribute UMIs proportionally to unique mappers, if present, and uniformly if not. - EM ... multi-gene UMIs are distributed using Expectation Maximization algorithm |`string`, example: `"Unique"` | -|`--soloUMIdedup` |type of UMI deduplication (collapsing) algorithm - 1MM_All ... all UMIs with 1 mismatch distance to each other are collapsed (i.e. counted once). - 1MM_Directional_UMItools ... follows the "directional" method from the UMI-tools by Smith, Heger and Sudbery (Genome Research 2017). - 1MM_Directional ... same as 1MM_Directional_UMItools, but with more stringent criteria for duplicate UMIs - Exact ... only exactly matching UMIs are collapsed. - NoDedup ... no deduplication of UMIs, count all reads. - 1MM_CR ... CellRanger2-4 algorithm for 1MM UMI collapsing. |`string`, example: `"1MM_All"` | -|`--soloUMIfiltering` |type of UMI filtering (for reads uniquely mapping to genes) - - ... basic filtering: remove UMIs with N and homopolymers (similar to CellRanger 2.2.0). - MultiGeneUMI ... basic + remove lower-count UMIs that map to more than one gene. - MultiGeneUMI_All ... basic + remove all UMIs that map to more than one gene. - MultiGeneUMI_CR ... basic + remove lower-count UMIs that map to more than one gene, matching CellRanger > 3.0.0 . Only works with --soloUMIdedup 1MM_CR |`string` | -|`--soloOutFileNames` |file names for STARsolo output: file_name_prefix gene_names barcode_sequences cell_feature_count_matrix |`string`, example: `"Solo.out/"`, example: `"features.tsv"`, example: `"barcodes.tsv"`, example: `"matrix.mtx"` | -|`--soloCellFilter` |cell filtering type and parameters - None ... do not output filtered cells - TopCells ... only report top cells by UMI count, followed by the exact number of cells - CellRanger2.2 ... simple filtering of CellRanger 2.2. Can be followed by numbers: number of expected cells, robust maximum percentile for UMI count, maximum to minimum ratio for UMI count The harcoded values are from CellRanger: nExpectedCells=3000; maxPercentile=0.99; maxMinRatio=10 - EmptyDrops_CR ... EmptyDrops filtering in CellRanger flavor. Please cite the original EmptyDrops paper: A.T.L Lun et al, Genome Biology, 20, 63 (2019): https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1662-y Can be followed by 10 numeric parameters: nExpectedCells maxPercentile maxMinRatio indMin indMax umiMin umiMinFracMedian candMaxN FDR simN The harcoded values are from CellRanger: 3000 0.99 10 45000 90000 500 0.01 20000 0.01 10000 |`string`, example: `"CellRanger2.2"`, example: `"3000"`, example: `"0.99"`, example: `"10"` | -|`--soloOutFormatFeaturesGeneField3` |field 3 in the Gene features.tsv file. If "-", then no 3rd field is output. |`string`, example: `"Gene Expression"` | -|`--soloCellReadStats` |Output reads statistics for each CB - Standard ... standard output |`string` | +|Name |Description |Attributes | +|:-----------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------| +|`--soloType` |type of single-cell RNA-seq - CB_UMI_Simple ... (a.k.a. Droplet) one UMI and one Cell Barcode of fixed length in read2, e.g. Drop-seq and 10X Chromium. - CB_UMI_Complex ... multiple Cell Barcodes of varying length, one UMI of fixed length and one adapter sequence of fixed length are allowed in read2 only (e.g. inDrop, ddSeq). - CB_samTagOut ... output Cell Barcode as CR and/or CB SAm tag. No UMI counting. --readFilesIn cDNA_read1 [cDNA_read2 if paired-end] CellBarcode_read . Requires --outSAMtype BAM Unsorted [and/or SortedByCoordinate] - SmartSeq ... Smart-seq: each cell in a separate FASTQ (paired- or single-end), barcodes are corresponding read-groups, no UMI sequences, alignments deduplicated according to alignment start and end (after extending soft-clipped bases) |List of `string`, multiple_sep: `";"` | +|`--soloCBwhitelist` |file(s) with whitelist(s) of cell barcodes. Only --soloType CB_UMI_Complex allows more than one whitelist file. - None ... no whitelist: all cell barcodes are allowed |List of `string`, multiple_sep: `";"` | +|`--soloCBstart` |cell barcode start base |`integer`, example: `1` | +|`--soloCBlen` |cell barcode length |`integer`, example: `16` | +|`--soloUMIstart` |UMI start base |`integer`, example: `17` | +|`--soloUMIlen` |UMI length |`integer`, example: `10` | +|`--soloBarcodeReadLength` |length of the barcode read - 1 ... equal to sum of soloCBlen+soloUMIlen - 0 ... not defined, do not check |`integer`, example: `1` | +|`--soloBarcodeMate` |identifies which read mate contains the barcode (CB+UMI) sequence - 0 ... barcode sequence is on separate read, which should always be the last file in the --readFilesIn listed - 1 ... barcode sequence is a part of mate 1 - 2 ... barcode sequence is a part of mate 2 |`integer`, example: `0` | +|`--soloCBposition` |position of Cell Barcode(s) on the barcode read. Presently only works with --soloType CB_UMI_Complex, and barcodes are assumed to be on Read2. Format for each barcode: startAnchor_startPosition_endAnchor_endPosition start(end)Anchor defines the Anchor Base for the CB: 0: read start; 1: read end; 2: adapter start; 3: adapter end start(end)Position is the 0-based position with of the CB start(end) with respect to the Anchor Base String for different barcodes are separated by space. Example: inDrop (Zilionis et al, Nat. Protocols, 2017): --soloCBposition 0_0_2_-1 3_1_3_8 |List of `string`, multiple_sep: `";"` | +|`--soloUMIposition` |position of the UMI on the barcode read, same as soloCBposition Example: inDrop (Zilionis et al, Nat. Protocols, 2017): --soloCBposition 3_9_3_14 |`string` | +|`--soloAdapterSequence` |adapter sequence to anchor barcodes. Only one adapter sequence is allowed. |`string` | +|`--soloAdapterMismatchesNmax` |maximum number of mismatches allowed in adapter sequence. |`integer`, example: `1` | +|`--soloCBmatchWLtype` |matching the Cell Barcodes to the WhiteList - Exact ... only exact matches allowed - 1MM ... only one match in whitelist with 1 mismatched base allowed. Allowed CBs have to have at least one read with exact match. - 1MM_multi ... multiple matches in whitelist with 1 mismatched base allowed, posterior probability calculation is used choose one of the matches. Allowed CBs have to have at least one read with exact match. This option matches best with CellRanger 2.2.0 - 1MM_multi_pseudocounts ... same as 1MM_Multi, but pseudocounts of 1 are added to all whitelist barcodes. - 1MM_multi_Nbase_pseudocounts ... same as 1MM_multi_pseudocounts, multimatching to WL is allowed for CBs with N-bases. This option matches best with CellRanger >= 3.0.0 - EditDist_2 ... allow up to edit distance of 3 fpr each of the barcodes. May include one deletion + one insertion. Only works with --soloType CB_UMI_Complex. Matches to multiple passlist barcdoes are not allowed. Similar to ParseBio Split-seq pipeline. |`string`, example: `"1MM_multi"` | +|`--soloInputSAMattrBarcodeSeq` |when inputting reads from a SAM file (--readsFileType SAM SE/PE), these SAM attributes mark the barcode sequence (in proper order). For instance, for 10X CellRanger or STARsolo BAMs, use --soloInputSAMattrBarcodeSeq CR UR . This parameter is required when running STARsolo with input from SAM. |List of `string`, multiple_sep: `";"` | +|`--soloInputSAMattrBarcodeQual` |when inputting reads from a SAM file (--readsFileType SAM SE/PE), these SAM attributes mark the barcode qualities (in proper order). For instance, for 10X CellRanger or STARsolo BAMs, use --soloInputSAMattrBarcodeQual CY UY . If this parameter is '-' (default), the quality 'H' will be assigned to all bases. |List of `string`, multiple_sep: `";"` | +|`--soloStrand` |strandedness of the solo libraries: - Unstranded ... no strand information - Forward ... read strand same as the original RNA molecule - Reverse ... read strand opposite to the original RNA molecule |`string`, example: `"Forward"` | +|`--soloFeatures` |genomic features for which the UMI counts per Cell Barcode are collected - Gene ... genes: reads match the gene transcript - SJ ... splice junctions: reported in SJ.out.tab - GeneFull ... full gene (pre-mRNA): count all reads overlapping genes' exons and introns - GeneFull_ExonOverIntron ... full gene (pre-mRNA): count all reads overlapping genes' exons and introns: prioritize 100% overlap with exons - GeneFull_Ex50pAS ... full gene (pre-RNA): count all reads overlapping genes' exons and introns: prioritize >50% overlap with exons. Do not count reads with 100% exonic overlap in the antisense direction. |List of `string`, example: `"Gene"`, multiple_sep: `";"` | +|`--soloMultiMappers` |counting method for reads mapping to multiple genes - Unique ... count only reads that map to unique genes - Uniform ... uniformly distribute multi-genic UMIs to all genes - Rescue ... distribute UMIs proportionally to unique+uniform counts (~ first iteration of EM) - PropUnique ... distribute UMIs proportionally to unique mappers, if present, and uniformly if not. - EM ... multi-gene UMIs are distributed using Expectation Maximization algorithm |List of `string`, example: `"Unique"`, multiple_sep: `";"` | +|`--soloUMIdedup` |type of UMI deduplication (collapsing) algorithm - 1MM_All ... all UMIs with 1 mismatch distance to each other are collapsed (i.e. counted once). - 1MM_Directional_UMItools ... follows the "directional" method from the UMI-tools by Smith, Heger and Sudbery (Genome Research 2017). - 1MM_Directional ... same as 1MM_Directional_UMItools, but with more stringent criteria for duplicate UMIs - Exact ... only exactly matching UMIs are collapsed. - NoDedup ... no deduplication of UMIs, count all reads. - 1MM_CR ... CellRanger2-4 algorithm for 1MM UMI collapsing. |List of `string`, example: `"1MM_All"`, multiple_sep: `";"` | +|`--soloUMIfiltering` |type of UMI filtering (for reads uniquely mapping to genes) - - ... basic filtering: remove UMIs with N and homopolymers (similar to CellRanger 2.2.0). - MultiGeneUMI ... basic + remove lower-count UMIs that map to more than one gene. - MultiGeneUMI_All ... basic + remove all UMIs that map to more than one gene. - MultiGeneUMI_CR ... basic + remove lower-count UMIs that map to more than one gene, matching CellRanger > 3.0.0 . Only works with --soloUMIdedup 1MM_CR |List of `string`, multiple_sep: `";"` | +|`--soloOutFileNames` |file names for STARsolo output: file_name_prefix gene_names barcode_sequences cell_feature_count_matrix |List of `string`, example: `"Solo.out/", "features.tsv", "barcodes.tsv", "matrix.mtx"`, multiple_sep: `";"` | +|`--soloCellFilter` |cell filtering type and parameters - None ... do not output filtered cells - TopCells ... only report top cells by UMI count, followed by the exact number of cells - CellRanger2.2 ... simple filtering of CellRanger 2.2. Can be followed by numbers: number of expected cells, robust maximum percentile for UMI count, maximum to minimum ratio for UMI count The harcoded values are from CellRanger: nExpectedCells=3000; maxPercentile=0.99; maxMinRatio=10 - EmptyDrops_CR ... EmptyDrops filtering in CellRanger flavor. Please cite the original EmptyDrops paper: A.T.L Lun et al, Genome Biology, 20, 63 (2019): https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1662-y Can be followed by 10 numeric parameters: nExpectedCells maxPercentile maxMinRatio indMin indMax umiMin umiMinFracMedian candMaxN FDR simN The harcoded values are from CellRanger: 3000 0.99 10 45000 90000 500 0.01 20000 0.01 10000 |List of `string`, example: `"CellRanger2.2", "3000", "0.99", "10"`, multiple_sep: `";"` | +|`--soloOutFormatFeaturesGeneField3` |field 3 in the Gene features.tsv file. If "-", then no 3rd field is output. |List of `string`, example: `"Gene Expression"`, multiple_sep: `";"` | +|`--soloCellReadStats` |Output reads statistics for each CB - Standard ... standard output |`string` | ## Authors diff --git a/components/modules/mapping/star_align_v273a.qmd b/components/modules/mapping/star_align_v273a.qmd index 93f28212..841908fb 100644 --- a/components/modules/mapping/star_align_v273a.qmd +++ b/components/modules/mapping/star_align_v273a.qmd @@ -67,11 +67,11 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Input/Output -|Name |Description |Attributes | -|:-------------|:--------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------| -|`--input` |The FASTQ files to be analyzed. Corresponds to the --readFilesIn in the STAR command. |`file`, required, example: `"mysample_S1_L001_R1_001.fastq.gz"`, example: `"mysample_S1_L001_R2_001.fastq.gz"` | -|`--reference` |Path to the reference built by star_build_reference. Corresponds to the --genomeDir in the STAR command. |`file`, required, example: `"/path/to/reference"` | -|`--output` |Path to output directory. Corresponds to the --outFileNamePrefix in the STAR command. |`file`, required, example: `"/path/to/foo"` | +|Name |Description |Attributes | +|:-------------|:--------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------| +|`--input` |The FASTQ files to be analyzed. Corresponds to the --readFilesIn in the STAR command. |List of `file`, required, example: `"mysample_S1_L001_R1_001.fastq.gz", "mysample_S1_L001_R2_001.fastq.gz"`, multiple_sep: `";"` | +|`--reference` |Path to the reference built by star_build_reference. Corresponds to the --genomeDir in the STAR command. |`file`, required, example: `"/path/to/reference"` | +|`--output` |Path to output directory. Corresponds to the --outFileNamePrefix in the STAR command. |`file`, required, example: `"/path/to/foo"` | ### Run Parameters @@ -83,30 +83,30 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Genome Parameters -|Name |Description |Attributes | -|:-----------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------| -|`--genomeLoad` |mode of shared memory usage for the genome files. Only used with --runMode alignReads. - LoadAndKeep ... load genome into shared and keep it in memory after run - LoadAndRemove ... load genome into shared but remove it after run - LoadAndExit ... load genome into shared memory and exit, keeping the genome in memory for future runs - Remove ... do not map anything, just remove loaded genome from memory - NoSharedMemory ... do not use shared memory, each job will have its own private copy of the genome |`string`, example: `"NoSharedMemory"` | -|`--genomeFastaFiles` |path(s) to the fasta files with the genome sequences, separated by spaces. These files should be plain text FASTA files, they *cannot* be zipped. Required for the genome generation (--runMode genomeGenerate). Can also be used in the mapping (--runMode alignReads) to add extra (new) sequences to the genome (e.g. spike-ins). |`file` | -|`--genomeFileSizes` |genome files exact sizes in bytes. Typically, this should not be defined by the user. |`integer`, example: `0` | -|`--genomeTransformOutput` |which output to transform back to original genome - SAM ... SAM/BAM alignments - SJ ... splice junctions (SJ.out.tab) - None ... no transformation of the output |`string` | -|`--genomeChrSetMitochondrial` |names of the mitochondrial chromosomes. Presently only used for STARsolo statistics output/ |`string`, example: `"chrM"`, example: `"M"`, example: `"MT"` | +|Name |Description |Attributes | +|:-----------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------| +|`--genomeLoad` |mode of shared memory usage for the genome files. Only used with --runMode alignReads. - LoadAndKeep ... load genome into shared and keep it in memory after run - LoadAndRemove ... load genome into shared but remove it after run - LoadAndExit ... load genome into shared memory and exit, keeping the genome in memory for future runs - Remove ... do not map anything, just remove loaded genome from memory - NoSharedMemory ... do not use shared memory, each job will have its own private copy of the genome |`string`, example: `"NoSharedMemory"` | +|`--genomeFastaFiles` |path(s) to the fasta files with the genome sequences, separated by spaces. These files should be plain text FASTA files, they *cannot* be zipped. Required for the genome generation (--runMode genomeGenerate). Can also be used in the mapping (--runMode alignReads) to add extra (new) sequences to the genome (e.g. spike-ins). |List of `file`, multiple_sep: `";"` | +|`--genomeFileSizes` |genome files exact sizes in bytes. Typically, this should not be defined by the user. |List of `integer`, example: `0`, multiple_sep: `";"` | +|`--genomeTransformOutput` |which output to transform back to original genome - SAM ... SAM/BAM alignments - SJ ... splice junctions (SJ.out.tab) - None ... no transformation of the output |List of `string`, multiple_sep: `";"` | +|`--genomeChrSetMitochondrial` |names of the mitochondrial chromosomes. Presently only used for STARsolo statistics output/ |List of `string`, example: `"chrM", "M", "MT"`, multiple_sep: `";"` | ### Splice Junctions Database -|Name |Description |Attributes | -|:----------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------| -|`--sjdbFileChrStartEnd` |path to the files with genomic coordinates (chr start end strand) for the splice junction introns. Multiple files can be supplied and will be concatenated. |`string` | -|`--sjdbGTFfile` |path to the GTF file with annotations |`file` | -|`--sjdbGTFchrPrefix` |prefix for chromosome names in a GTF file (e.g. 'chr' for using ENSMEBL annotations with UCSC genomes) |`string` | -|`--sjdbGTFfeatureExon` |feature type in GTF file to be used as exons for building transcripts |`string`, example: `"exon"` | -|`--sjdbGTFtagExonParentTranscript` |GTF attribute name for parent transcript ID (default "transcript_id" works for GTF files) |`string`, example: `"transcript_id"` | -|`--sjdbGTFtagExonParentGene` |GTF attribute name for parent gene ID (default "gene_id" works for GTF files) |`string`, example: `"gene_id"` | -|`--sjdbGTFtagExonParentGeneName` |GTF attribute name for parent gene name |`string`, example: `"gene_name"` | -|`--sjdbGTFtagExonParentGeneType` |GTF attribute name for parent gene type |`string`, example: `"gene_type"`, example: `"gene_biotype"` | -|`--sjdbOverhang` |length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1) |`integer`, example: `100` | -|`--sjdbScore` |extra alignment score for alignments that cross database junctions |`integer`, example: `2` | -|`--sjdbInsertSave` |which files to save when sjdb junctions are inserted on the fly at the mapping step - Basic ... only small junction / transcript files - All ... all files including big Genome, SA and SAindex - this will create a complete genome directory |`string`, example: `"Basic"` | +|Name |Description |Attributes | +|:----------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------| +|`--sjdbFileChrStartEnd` |path to the files with genomic coordinates (chr start end strand) for the splice junction introns. Multiple files can be supplied and will be concatenated. |List of `string`, multiple_sep: `";"` | +|`--sjdbGTFfile` |path to the GTF file with annotations |`file` | +|`--sjdbGTFchrPrefix` |prefix for chromosome names in a GTF file (e.g. 'chr' for using ENSMEBL annotations with UCSC genomes) |`string` | +|`--sjdbGTFfeatureExon` |feature type in GTF file to be used as exons for building transcripts |`string`, example: `"exon"` | +|`--sjdbGTFtagExonParentTranscript` |GTF attribute name for parent transcript ID (default "transcript_id" works for GTF files) |`string`, example: `"transcript_id"` | +|`--sjdbGTFtagExonParentGene` |GTF attribute name for parent gene ID (default "gene_id" works for GTF files) |`string`, example: `"gene_id"` | +|`--sjdbGTFtagExonParentGeneName` |GTF attribute name for parent gene name |List of `string`, example: `"gene_name"`, multiple_sep: `";"` | +|`--sjdbGTFtagExonParentGeneType` |GTF attribute name for parent gene type |List of `string`, example: `"gene_type", "gene_biotype"`, multiple_sep: `";"` | +|`--sjdbOverhang` |length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1) |`integer`, example: `100` | +|`--sjdbScore` |extra alignment score for alignments that cross database junctions |`integer`, example: `2` | +|`--sjdbInsertSave` |which files to save when sjdb junctions are inserted on the fly at the mapping step - Basic ... only small junction / transcript files - All ... all files including big Genome, SA and SAindex - this will create a complete genome directory |`string`, example: `"Basic"` | ### Variation parameters @@ -118,43 +118,43 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Read Parameters -|Name |Description |Attributes | -|:------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------| -|`--readFilesType` |format of input read files - Fastx ... FASTA or FASTQ - SAM SE ... SAM or BAM single-end reads; for BAM use --readFilesCommand samtools view - SAM PE ... SAM or BAM paired-end reads; for BAM use --readFilesCommand samtools view |`string`, example: `"Fastx"` | -|`--readFilesSAMattrKeep` |for --readFilesType SAM SE/PE, which SAM tags to keep in the output BAM, e.g.: --readFilesSAMtagsKeep RG PL - All ... keep all tags - None ... do not keep any tags |`string`, example: `"All"` | -|`--readFilesManifest` |path to the "manifest" file with the names of read files. The manifest file should contain 3 tab-separated columns: paired-end reads: read1_file_name $tab$ read2_file_name $tab$ read_group_line. single-end reads: read1_file_name $tab$ - $tab$ read_group_line. Spaces, but not tabs are allowed in file names. If read_group_line does not start with ID:, it can only contain one ID field, and ID: will be added to it. If read_group_line starts with ID:, it can contain several fields separated by $tab$, and all fields will be be copied verbatim into SAM @RG header line. |`file` | -|`--readFilesPrefix` |prefix for the read files names, i.e. it will be added in front of the strings in --readFilesIn |`string` | -|`--readFilesCommand` |command line to execute for each of the input file. This command should generate FASTA or FASTQ text and send it to stdout For example: zcat - to uncompress .gz files, bzcat - to uncompress .bz2 files, etc. |`string` | -|`--readMapNumber` |number of reads to map from the beginning of the file -1: map all reads |`integer`, example: `-1` | -|`--readMatesLengthsIn` |Equal/NotEqual - lengths of names,sequences,qualities for both mates are the same / not the same. NotEqual is safe in all situations. |`string`, example: `"NotEqual"` | -|`--readNameSeparator` |character(s) separating the part of the read names that will be trimmed in output (read name after space is always trimmed) |`string`, example: `"/"` | -|`--readQualityScoreBase` |number to be subtracted from the ASCII code to get Phred quality score |`integer`, example: `33` | +|Name |Description |Attributes | +|:------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------| +|`--readFilesType` |format of input read files - Fastx ... FASTA or FASTQ - SAM SE ... SAM or BAM single-end reads; for BAM use --readFilesCommand samtools view - SAM PE ... SAM or BAM paired-end reads; for BAM use --readFilesCommand samtools view |`string`, example: `"Fastx"` | +|`--readFilesSAMattrKeep` |for --readFilesType SAM SE/PE, which SAM tags to keep in the output BAM, e.g.: --readFilesSAMtagsKeep RG PL - All ... keep all tags - None ... do not keep any tags |List of `string`, example: `"All"`, multiple_sep: `";"` | +|`--readFilesManifest` |path to the "manifest" file with the names of read files. The manifest file should contain 3 tab-separated columns: paired-end reads: read1_file_name $tab$ read2_file_name $tab$ read_group_line. single-end reads: read1_file_name $tab$ - $tab$ read_group_line. Spaces, but not tabs are allowed in file names. If read_group_line does not start with ID:, it can only contain one ID field, and ID: will be added to it. If read_group_line starts with ID:, it can contain several fields separated by $tab$, and all fields will be be copied verbatim into SAM @RG header line. |`file` | +|`--readFilesPrefix` |prefix for the read files names, i.e. it will be added in front of the strings in --readFilesIn |`string` | +|`--readFilesCommand` |command line to execute for each of the input file. This command should generate FASTA or FASTQ text and send it to stdout For example: zcat - to uncompress .gz files, bzcat - to uncompress .bz2 files, etc. |List of `string`, multiple_sep: `";"` | +|`--readMapNumber` |number of reads to map from the beginning of the file -1: map all reads |`integer`, example: `-1` | +|`--readMatesLengthsIn` |Equal/NotEqual - lengths of names,sequences,qualities for both mates are the same / not the same. NotEqual is safe in all situations. |`string`, example: `"NotEqual"` | +|`--readNameSeparator` |character(s) separating the part of the read names that will be trimmed in output (read name after space is always trimmed) |List of `string`, example: `"/"`, multiple_sep: `";"` | +|`--readQualityScoreBase` |number to be subtracted from the ASCII code to get Phred quality score |`integer`, example: `33` | ### Read Clipping -|Name |Description |Attributes | -|:----------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------| -|`--clipAdapterType` |adapter clipping type - Hamming ... adapter clipping based on Hamming distance, with the number of mismatches controlled by --clip5pAdapterMMp - CellRanger4 ... 5p and 3p adapter clipping similar to CellRanger4. Utilizes Opal package by Martin Sosic: https://github.com/Martinsos/opal - None ... no adapter clipping, all other clip* parameters are disregarded |`string`, example: `"Hamming"` | -|`--clip3pNbases` |number(s) of bases to clip from 3p of each mate. If one value is given, it will be assumed the same for both mates. |`integer`, example: `0` | -|`--clip3pAdapterSeq` |adapter sequences to clip from 3p of each mate. If one value is given, it will be assumed the same for both mates. - polyA ... polyA sequence with the length equal to read length |`string` | -|`--clip3pAdapterMMp` |max proportion of mismatches for 3p adapter clipping for each mate. If one value is given, it will be assumed the same for both mates. |`double`, example: `0.1` | -|`--clip3pAfterAdapterNbases` |number of bases to clip from 3p of each mate after the adapter clipping. If one value is given, it will be assumed the same for both mates. |`integer`, example: `0` | -|`--clip5pNbases` |number(s) of bases to clip from 5p of each mate. If one value is given, it will be assumed the same for both mates. |`integer`, example: `0` | +|Name |Description |Attributes | +|:----------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------| +|`--clipAdapterType` |adapter clipping type - Hamming ... adapter clipping based on Hamming distance, with the number of mismatches controlled by --clip5pAdapterMMp - CellRanger4 ... 5p and 3p adapter clipping similar to CellRanger4. Utilizes Opal package by Martin Sosic: https://github.com/Martinsos/opal - None ... no adapter clipping, all other clip* parameters are disregarded |`string`, example: `"Hamming"` | +|`--clip3pNbases` |number(s) of bases to clip from 3p of each mate. If one value is given, it will be assumed the same for both mates. |List of `integer`, example: `0`, multiple_sep: `";"` | +|`--clip3pAdapterSeq` |adapter sequences to clip from 3p of each mate. If one value is given, it will be assumed the same for both mates. - polyA ... polyA sequence with the length equal to read length |List of `string`, multiple_sep: `";"` | +|`--clip3pAdapterMMp` |max proportion of mismatches for 3p adapter clipping for each mate. If one value is given, it will be assumed the same for both mates. |List of `double`, example: `0.1`, multiple_sep: `";"` | +|`--clip3pAfterAdapterNbases` |number of bases to clip from 3p of each mate after the adapter clipping. If one value is given, it will be assumed the same for both mates. |List of `integer`, example: `0`, multiple_sep: `";"` | +|`--clip5pNbases` |number(s) of bases to clip from 5p of each mate. If one value is given, it will be assumed the same for both mates. |List of `integer`, example: `0`, multiple_sep: `";"` | ### Limits -|Name |Description |Attributes | -|:---------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------| -|`--limitGenomeGenerateRAM` |maximum available RAM (bytes) for genome generation |`long`, example: `NA` | -|`--limitIObufferSize` |max available buffers size (bytes) for input/output, per thread |`long`, example: `30000000`, example: `50000000` | -|`--limitOutSAMoneReadBytes` |max size of the SAM record (bytes) for one read. Recommended value: >(2*(LengthMate1+LengthMate2+100)*outFilterMultimapNmax |`long`, example: `100000` | -|`--limitOutSJoneRead` |max number of junctions for one read (including all multi-mappers) |`integer`, example: `1000` | -|`--limitOutSJcollapsed` |max number of collapsed junctions |`integer`, example: `1000000` | -|`--limitBAMsortRAM` |maximum available RAM (bytes) for sorting BAM. If =0, it will be set to the genome index size. 0 value can only be used with --genomeLoad NoSharedMemory option. |`long`, example: `0` | -|`--limitSjdbInsertNsj` |maximum number of junctions to be inserted to the genome on the fly at the mapping stage, including those from annotations and those detected in the 1st step of the 2-pass run |`integer`, example: `1000000` | -|`--limitNreadsSoft` |soft limit on the number of reads |`integer`, example: `-1` | +|Name |Description |Attributes | +|:---------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------| +|`--limitGenomeGenerateRAM` |maximum available RAM (bytes) for genome generation |`long`, example: `NA` | +|`--limitIObufferSize` |max available buffers size (bytes) for input/output, per thread |List of `long`, example: `30000000, 50000000`, multiple_sep: `";"` | +|`--limitOutSAMoneReadBytes` |max size of the SAM record (bytes) for one read. Recommended value: >(2*(LengthMate1+LengthMate2+100)*outFilterMultimapNmax |`long`, example: `100000` | +|`--limitOutSJoneRead` |max number of junctions for one read (including all multi-mappers) |`integer`, example: `1000` | +|`--limitOutSJcollapsed` |max number of collapsed junctions |`integer`, example: `1000000` | +|`--limitBAMsortRAM` |maximum available RAM (bytes) for sorting BAM. If =0, it will be set to the genome index size. 0 value can only be used with --genomeLoad NoSharedMemory option. |`long`, example: `0` | +|`--limitSjdbInsertNsj` |maximum number of junctions to be inserted to the genome on the fly at the mapping stage, including those from annotations and those detected in the 1st step of the 2-pass run |`integer`, example: `1000000` | +|`--limitNreadsSoft` |soft limit on the number of reads |`integer`, example: `-1` | ### Output: general @@ -170,30 +170,30 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Output: SAM and BAM -|Name |Description |Attributes | -|:---------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------| -|`--outSAMtype` |type of SAM/BAM output 1st word: - BAM ... output BAM without sorting - SAM ... output SAM without sorting - None ... no SAM/BAM output 2nd, 3rd: - Unsorted ... standard unsorted - SortedByCoordinate ... sorted by coordinate. This option will allocate extra memory for sorting which can be specified by --limitBAMsortRAM. |`string`, example: `"SAM"` | -|`--outSAMmode` |mode of SAM output - None ... no SAM output - Full ... full SAM output - NoQS ... full SAM but without quality scores |`string`, example: `"Full"` | -|`--outSAMstrandField` |Cufflinks-like strand field flag - None ... not used - intronMotif ... strand derived from the intron motif. This option changes the output alignments: reads with inconsistent and/or non-canonical introns are filtered out. |`string` | -|`--outSAMattributes` |a string of desired SAM attributes, in the order desired for the output SAM. Tags can be listed in any combination/order. ***Presets: - None ... no attributes - Standard ... NH HI AS nM - All ... NH HI AS nM NM MD jM jI MC ch ***Alignment: - NH ... number of loci the reads maps to: =1 for unique mappers, >1 for multimappers. Standard SAM tag. - HI ... multiple alignment index, starts with --outSAMattrIHstart (=1 by default). Standard SAM tag. - AS ... local alignment score, +1/-1 for matches/mismateches, score* penalties for indels and gaps. For PE reads, total score for two mates. Stadnard SAM tag. - nM ... number of mismatches. For PE reads, sum over two mates. - NM ... edit distance to the reference (number of mismatched + inserted + deleted bases) for each mate. Standard SAM tag. - MD ... string encoding mismatched and deleted reference bases (see standard SAM specifications). Standard SAM tag. - jM ... intron motifs for all junctions (i.e. N in CIGAR): 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT. If splice junctions database is used, and a junction is annotated, 20 is added to its motif value. - jI ... start and end of introns for all junctions (1-based). - XS ... alignment strand according to --outSAMstrandField. - MC ... mate's CIGAR string. Standard SAM tag. - ch ... marks all segment of all chimeric alingments for --chimOutType WithinBAM output. - cN ... number of bases clipped from the read ends: 5' and 3' ***Variation: - vA ... variant allele - vG ... genomic coordinate of the variant overlapped by the read. - vW ... 1 - alignment passes WASP filtering; 2,3,4,5,6,7 - alignment does not pass WASP filtering. Requires --waspOutputMode SAMtag. ***STARsolo: - CR CY UR UY ... sequences and quality scores of cell barcodes and UMIs for the solo* demultiplexing. - GX GN ... gene ID and gene name for unique-gene reads. - gx gn ... gene IDs and gene names for unique- and multi-gene reads. - CB UB ... error-corrected cell barcodes and UMIs for solo* demultiplexing. Requires --outSAMtype BAM SortedByCoordinate. - sM ... assessment of CB and UMI. - sS ... sequence of the entire barcode (CB,UMI,adapter). - sQ ... quality of the entire barcode. ***Unsupported/undocumented: - ha ... haplotype (1/2) when mapping to the diploid genome. Requires genome generated with --genomeTransformType Diploid . - rB ... alignment block read/genomic coordinates. - vR ... read coordinate of the variant. |`string`, example: `"Standard"` | -|`--outSAMattrIHstart` |start value for the IH attribute. 0 may be required by some downstream software, such as Cufflinks or StringTie. |`integer`, example: `1` | -|`--outSAMunmapped` |output of unmapped reads in the SAM format 1st word: - None ... no output - Within ... output unmapped reads within the main SAM file (i.e. Aligned.out.sam) 2nd word: - KeepPairs ... record unmapped mate for each alignment, and, in case of unsorted output, keep it adjacent to its mapped mate. Only affects multi-mapping reads. |`string` | -|`--outSAMorder` |type of sorting for the SAM output Paired: one mate after the other for all paired alignments PairedKeepInputOrder: one mate after the other for all paired alignments, the order is kept the same as in the input FASTQ files |`string`, example: `"Paired"` | -|`--outSAMprimaryFlag` |which alignments are considered primary - all others will be marked with 0x100 bit in the FLAG - OneBestScore ... only one alignment with the best score is primary - AllBestScore ... all alignments with the best score are primary |`string`, example: `"OneBestScore"` | -|`--outSAMreadID` |read ID record type - Standard ... first word (until space) from the FASTx read ID line, removing /1,/2 from the end - Number ... read number (index) in the FASTx file |`string`, example: `"Standard"` | -|`--outSAMmapqUnique` |0 to 255: the MAPQ value for unique mappers |`integer`, example: `255` | -|`--outSAMflagOR` |0 to 65535: sam FLAG will be bitwise OR'd with this value, i.e. FLAG=FLAG | outSAMflagOR. This is applied after all flags have been set by STAR, and after outSAMflagAND. Can be used to set specific bits that are not set otherwise. |`integer`, example: `0` | -|`--outSAMflagAND` |0 to 65535: sam FLAG will be bitwise AND'd with this value, i.e. FLAG=FLAG & outSAMflagOR. This is applied after all flags have been set by STAR, but before outSAMflagOR. Can be used to unset specific bits that are not set otherwise. |`integer`, example: `65535` | -|`--outSAMattrRGline` |SAM/BAM read group line. The first word contains the read group identifier and must start with "ID:", e.g. --outSAMattrRGline ID:xxx CN:yy "DS:z z z". xxx will be added as RG tag to each output alignment. Any spaces in the tag values have to be double quoted. Comma separated RG lines correspons to different (comma separated) input files in --readFilesIn. Commas have to be surrounded by spaces, e.g. --outSAMattrRGline ID:xxx , ID:zzz "DS:z z" , ID:yyy DS:yyyy |`string` | -|`--outSAMheaderHD` |@HD (header) line of the SAM header |`string` | -|`--outSAMheaderPG` |extra @PG (software) line of the SAM header (in addition to STAR) |`string` | -|`--outSAMheaderCommentFile` |path to the file with @CO (comment) lines of the SAM header |`string` | -|`--outSAMfilter` |filter the output into main SAM/BAM files - KeepOnlyAddedReferences ... only keep the reads for which all alignments are to the extra reference sequences added with --genomeFastaFiles at the mapping stage. - KeepAllAddedReferences ... keep all alignments to the extra reference sequences added with --genomeFastaFiles at the mapping stage. |`string` | -|`--outSAMmultNmax` |max number of multiple alignments for a read that will be output to the SAM/BAM files. Note that if this value is not equal to -1, the top scoring alignment will be output first - -1 ... all alignments (up to --outFilterMultimapNmax) will be output |`integer`, example: `-1` | -|`--outSAMtlen` |calculation method for the TLEN field in the SAM/BAM files - 1 ... leftmost base of the (+)strand mate to rightmost base of the (-)mate. (+)sign for the (+)strand mate - 2 ... leftmost base of any mate to rightmost base of any mate. (+)sign for the mate with the leftmost base. This is different from 1 for overlapping mates with protruding ends |`integer`, example: `1` | -|`--outBAMcompression` |-1 to 10 BAM compression level, -1=default compression (6?), 0=no compression, 10=maximum compression |`integer`, example: `1` | -|`--outBAMsortingThreadN` |>=0: number of threads for BAM sorting. 0 will default to min(6,--runThreadN). |`integer`, example: `0` | -|`--outBAMsortingBinsN` |>0: number of genome bins for coordinate-sorting |`integer`, example: `50` | +|Name |Description |Attributes | +|:---------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------| +|`--outSAMtype` |type of SAM/BAM output 1st word: - BAM ... output BAM without sorting - SAM ... output SAM without sorting - None ... no SAM/BAM output 2nd, 3rd: - Unsorted ... standard unsorted - SortedByCoordinate ... sorted by coordinate. This option will allocate extra memory for sorting which can be specified by --limitBAMsortRAM. |List of `string`, example: `"SAM"`, multiple_sep: `";"` | +|`--outSAMmode` |mode of SAM output - None ... no SAM output - Full ... full SAM output - NoQS ... full SAM but without quality scores |`string`, example: `"Full"` | +|`--outSAMstrandField` |Cufflinks-like strand field flag - None ... not used - intronMotif ... strand derived from the intron motif. This option changes the output alignments: reads with inconsistent and/or non-canonical introns are filtered out. |`string` | +|`--outSAMattributes` |a string of desired SAM attributes, in the order desired for the output SAM. Tags can be listed in any combination/order. ***Presets: - None ... no attributes - Standard ... NH HI AS nM - All ... NH HI AS nM NM MD jM jI MC ch ***Alignment: - NH ... number of loci the reads maps to: =1 for unique mappers, >1 for multimappers. Standard SAM tag. - HI ... multiple alignment index, starts with --outSAMattrIHstart (=1 by default). Standard SAM tag. - AS ... local alignment score, +1/-1 for matches/mismateches, score* penalties for indels and gaps. For PE reads, total score for two mates. Stadnard SAM tag. - nM ... number of mismatches. For PE reads, sum over two mates. - NM ... edit distance to the reference (number of mismatched + inserted + deleted bases) for each mate. Standard SAM tag. - MD ... string encoding mismatched and deleted reference bases (see standard SAM specifications). Standard SAM tag. - jM ... intron motifs for all junctions (i.e. N in CIGAR): 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT. If splice junctions database is used, and a junction is annotated, 20 is added to its motif value. - jI ... start and end of introns for all junctions (1-based). - XS ... alignment strand according to --outSAMstrandField. - MC ... mate's CIGAR string. Standard SAM tag. - ch ... marks all segment of all chimeric alingments for --chimOutType WithinBAM output. - cN ... number of bases clipped from the read ends: 5' and 3' ***Variation: - vA ... variant allele - vG ... genomic coordinate of the variant overlapped by the read. - vW ... 1 - alignment passes WASP filtering; 2,3,4,5,6,7 - alignment does not pass WASP filtering. Requires --waspOutputMode SAMtag. ***STARsolo: - CR CY UR UY ... sequences and quality scores of cell barcodes and UMIs for the solo* demultiplexing. - GX GN ... gene ID and gene name for unique-gene reads. - gx gn ... gene IDs and gene names for unique- and multi-gene reads. - CB UB ... error-corrected cell barcodes and UMIs for solo* demultiplexing. Requires --outSAMtype BAM SortedByCoordinate. - sM ... assessment of CB and UMI. - sS ... sequence of the entire barcode (CB,UMI,adapter). - sQ ... quality of the entire barcode. ***Unsupported/undocumented: - ha ... haplotype (1/2) when mapping to the diploid genome. Requires genome generated with --genomeTransformType Diploid . - rB ... alignment block read/genomic coordinates. - vR ... read coordinate of the variant. |List of `string`, example: `"Standard"`, multiple_sep: `";"` | +|`--outSAMattrIHstart` |start value for the IH attribute. 0 may be required by some downstream software, such as Cufflinks or StringTie. |`integer`, example: `1` | +|`--outSAMunmapped` |output of unmapped reads in the SAM format 1st word: - None ... no output - Within ... output unmapped reads within the main SAM file (i.e. Aligned.out.sam) 2nd word: - KeepPairs ... record unmapped mate for each alignment, and, in case of unsorted output, keep it adjacent to its mapped mate. Only affects multi-mapping reads. |List of `string`, multiple_sep: `";"` | +|`--outSAMorder` |type of sorting for the SAM output Paired: one mate after the other for all paired alignments PairedKeepInputOrder: one mate after the other for all paired alignments, the order is kept the same as in the input FASTQ files |`string`, example: `"Paired"` | +|`--outSAMprimaryFlag` |which alignments are considered primary - all others will be marked with 0x100 bit in the FLAG - OneBestScore ... only one alignment with the best score is primary - AllBestScore ... all alignments with the best score are primary |`string`, example: `"OneBestScore"` | +|`--outSAMreadID` |read ID record type - Standard ... first word (until space) from the FASTx read ID line, removing /1,/2 from the end - Number ... read number (index) in the FASTx file |`string`, example: `"Standard"` | +|`--outSAMmapqUnique` |0 to 255: the MAPQ value for unique mappers |`integer`, example: `255` | +|`--outSAMflagOR` |0 to 65535: sam FLAG will be bitwise OR'd with this value, i.e. FLAG=FLAG | outSAMflagOR. This is applied after all flags have been set by STAR, and after outSAMflagAND. Can be used to set specific bits that are not set otherwise. |`integer`, example: `0` | +|`--outSAMflagAND` |0 to 65535: sam FLAG will be bitwise AND'd with this value, i.e. FLAG=FLAG & outSAMflagOR. This is applied after all flags have been set by STAR, but before outSAMflagOR. Can be used to unset specific bits that are not set otherwise. |`integer`, example: `65535` | +|`--outSAMattrRGline` |SAM/BAM read group line. The first word contains the read group identifier and must start with "ID:", e.g. --outSAMattrRGline ID:xxx CN:yy "DS:z z z". xxx will be added as RG tag to each output alignment. Any spaces in the tag values have to be double quoted. Comma separated RG lines correspons to different (comma separated) input files in --readFilesIn. Commas have to be surrounded by spaces, e.g. --outSAMattrRGline ID:xxx , ID:zzz "DS:z z" , ID:yyy DS:yyyy |List of `string`, multiple_sep: `";"` | +|`--outSAMheaderHD` |@HD (header) line of the SAM header |List of `string`, multiple_sep: `";"` | +|`--outSAMheaderPG` |extra @PG (software) line of the SAM header (in addition to STAR) |List of `string`, multiple_sep: `";"` | +|`--outSAMheaderCommentFile` |path to the file with @CO (comment) lines of the SAM header |`string` | +|`--outSAMfilter` |filter the output into main SAM/BAM files - KeepOnlyAddedReferences ... only keep the reads for which all alignments are to the extra reference sequences added with --genomeFastaFiles at the mapping stage. - KeepAllAddedReferences ... keep all alignments to the extra reference sequences added with --genomeFastaFiles at the mapping stage. |List of `string`, multiple_sep: `";"` | +|`--outSAMmultNmax` |max number of multiple alignments for a read that will be output to the SAM/BAM files. Note that if this value is not equal to -1, the top scoring alignment will be output first - -1 ... all alignments (up to --outFilterMultimapNmax) will be output |`integer`, example: `-1` | +|`--outSAMtlen` |calculation method for the TLEN field in the SAM/BAM files - 1 ... leftmost base of the (+)strand mate to rightmost base of the (-)mate. (+)sign for the (+)strand mate - 2 ... leftmost base of any mate to rightmost base of any mate. (+)sign for the mate with the leftmost base. This is different from 1 for overlapping mates with protruding ends |`integer`, example: `1` | +|`--outBAMcompression` |-1 to 10 BAM compression level, -1=default compression (6?), 0=no compression, 10=maximum compression |`integer`, example: `1` | +|`--outBAMsortingThreadN` |>=0: number of threads for BAM sorting. 0 will default to min(6,--runThreadN). |`integer`, example: `0` | +|`--outBAMsortingBinsN` |>0: number of genome bins for coordinate-sorting |`integer`, example: `50` | ### BAM processing @@ -206,12 +206,12 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Output Wiggle -|Name |Description |Attributes | -|:--------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------| -|`--outWigType` |type of signal output, e.g. "bedGraph" OR "bedGraph read1_5p". Requires sorted BAM: --outSAMtype BAM SortedByCoordinate . 1st word: - None ... no signal output - bedGraph ... bedGraph format - wiggle ... wiggle format 2nd word: - read1_5p ... signal from only 5' of the 1st read, useful for CAGE/RAMPAGE etc - read2 ... signal from only 2nd read |`string` | -|`--outWigStrand` |strandedness of wiggle/bedGraph output - Stranded ... separate strands, str1 and str2 - Unstranded ... collapsed strands |`string`, example: `"Stranded"` | -|`--outWigReferencesPrefix` |prefix matching reference names to include in the output wiggle file, e.g. "chr", default "-" - include all references |`string` | -|`--outWigNorm` |type of normalization for the signal - RPM ... reads per million of mapped reads - None ... no normalization, "raw" counts |`string`, example: `"RPM"` | +|Name |Description |Attributes | +|:--------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------| +|`--outWigType` |type of signal output, e.g. "bedGraph" OR "bedGraph read1_5p". Requires sorted BAM: --outSAMtype BAM SortedByCoordinate . 1st word: - None ... no signal output - bedGraph ... bedGraph format - wiggle ... wiggle format 2nd word: - read1_5p ... signal from only 5' of the 1st read, useful for CAGE/RAMPAGE etc - read2 ... signal from only 2nd read |List of `string`, multiple_sep: `";"` | +|`--outWigStrand` |strandedness of wiggle/bedGraph output - Stranded ... separate strands, str1 and str2 - Unstranded ... collapsed strands |`string`, example: `"Stranded"` | +|`--outWigReferencesPrefix` |prefix matching reference names to include in the output wiggle file, e.g. "chr", default "-" - include all references |`string` | +|`--outWigNorm` |type of normalization for the signal - RPM ... reads per million of mapped reads - None ... no normalization, "raw" counts |`string`, example: `"RPM"` | ### Output Filtering @@ -241,14 +241,14 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Output Filtering: Splice Junctions -|Name |Description |Attributes | -|:-------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------| -|`--outSJfilterReads` |which reads to consider for collapsed splice junctions output - All ... all reads, unique- and multi-mappers - Unique ... uniquely mapping reads only |`string`, example: `"All"` | -|`--outSJfilterOverhangMin` |minimum overhang length for splice junctions on both sides for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif does not apply to annotated junctions |`integer`, example: `30`, example: `12`, example: `12`, example: `12` | -|`--outSJfilterCountUniqueMin` |minimum uniquely mapping read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied does not apply to annotated junctions |`integer`, example: `3`, example: `1`, example: `1`, example: `1` | -|`--outSJfilterCountTotalMin` |minimum total (multi-mapping+unique) read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied does not apply to annotated junctions |`integer`, example: `3`, example: `1`, example: `1`, example: `1` | -|`--outSJfilterDistToOtherSJmin` |minimum allowed distance to other junctions' donor/acceptor does not apply to annotated junctions |`integer`, example: `10`, example: `0`, example: `5`, example: `10` | -|`--outSJfilterIntronMaxVsReadN` |maximum gap allowed for junctions supported by 1,2,3,,,N reads i.e. by default junctions supported by 1 read can have gaps <=50000b, by 2 reads: <=100000b, by 3 reads: <=200000. by >=4 reads any gap <=alignIntronMax does not apply to annotated junctions |`integer`, example: `50000`, example: `100000`, example: `200000` | +|Name |Description |Attributes | +|:-------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------| +|`--outSJfilterReads` |which reads to consider for collapsed splice junctions output - All ... all reads, unique- and multi-mappers - Unique ... uniquely mapping reads only |`string`, example: `"All"` | +|`--outSJfilterOverhangMin` |minimum overhang length for splice junctions on both sides for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif does not apply to annotated junctions |List of `integer`, example: `30, 12, 12, 12`, multiple_sep: `";"` | +|`--outSJfilterCountUniqueMin` |minimum uniquely mapping read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied does not apply to annotated junctions |List of `integer`, example: `3, 1, 1, 1`, multiple_sep: `";"` | +|`--outSJfilterCountTotalMin` |minimum total (multi-mapping+unique) read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied does not apply to annotated junctions |List of `integer`, example: `3, 1, 1, 1`, multiple_sep: `";"` | +|`--outSJfilterDistToOtherSJmin` |minimum allowed distance to other junctions' donor/acceptor does not apply to annotated junctions |List of `integer`, example: `10, 0, 5, 10`, multiple_sep: `";"` | +|`--outSJfilterIntronMaxVsReadN` |maximum gap allowed for junctions supported by 1,2,3,,,N reads i.e. by default junctions supported by 1 read can have gaps <=50000b, by 2 reads: <=100000b, by 3 reads: <=200000. by >=4 reads any gap <=alignIntronMax does not apply to annotated junctions |List of `integer`, example: `50000, 100000, 200000`, multiple_sep: `";"` | ### Scoring @@ -269,32 +269,32 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Alignments and Seeding -|Name |Description |Attributes | -|:------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------| -|`--seedSearchStartLmax` |defines the search start point through the read - the read is split into pieces no longer than this value |`integer`, example: `50` | -|`--seedSearchStartLmaxOverLread` |seedSearchStartLmax normalized to read length (sum of mates' lengths for paired-end reads) |`double`, example: `1` | -|`--seedSearchLmax` |defines the maximum length of the seeds, if =0 seed length is not limited |`integer`, example: `0` | -|`--seedMultimapNmax` |only pieces that map fewer than this value are utilized in the stitching procedure |`integer`, example: `10000` | -|`--seedPerReadNmax` |max number of seeds per read |`integer`, example: `1000` | -|`--seedPerWindowNmax` |max number of seeds per window |`integer`, example: `50` | -|`--seedNoneLociPerWindow` |max number of one seed loci per window |`integer`, example: `10` | -|`--seedSplitMin` |min length of the seed sequences split by Ns or mate gap |`integer`, example: `12` | -|`--seedMapMin` |min length of seeds to be mapped |`integer`, example: `5` | -|`--alignIntronMin` |minimum intron size, genomic gap is considered intron if its length>=alignIntronMin, otherwise it is considered Deletion |`integer`, example: `21` | -|`--alignIntronMax` |maximum intron size, if 0, max intron size will be determined by (2^winBinNbits)*winAnchorDistNbins |`integer`, example: `0` | -|`--alignMatesGapMax` |maximum gap between two mates, if 0, max intron gap will be determined by (2^winBinNbits)*winAnchorDistNbins |`integer`, example: `0` | -|`--alignSJoverhangMin` |minimum overhang (i.e. block size) for spliced alignments |`integer`, example: `5` | -|`--alignSJstitchMismatchNmax` |maximum number of mismatches for stitching of the splice junctions (-1: no limit). (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. |`integer`, example: `0`, example: `-1`, example: `0`, example: `0` | -|`--alignSJDBoverhangMin` |minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments |`integer`, example: `3` | -|`--alignSplicedMateMapLmin` |minimum mapped length for a read mate that is spliced |`integer`, example: `0` | -|`--alignSplicedMateMapLminOverLmate` |alignSplicedMateMapLmin normalized to mate length |`double`, example: `0.66` | -|`--alignWindowsPerReadNmax` |max number of windows per read |`integer`, example: `10000` | -|`--alignTranscriptsPerWindowNmax` |max number of transcripts per window |`integer`, example: `100` | -|`--alignTranscriptsPerReadNmax` |max number of different alignments per read to consider |`integer`, example: `10000` | -|`--alignEndsType` |type of read ends alignment - Local ... standard local alignment with soft-clipping allowed - EndToEnd ... force end-to-end read alignment, do not soft-clip - Extend5pOfRead1 ... fully extend only the 5p of the read1, all other ends: local alignment - Extend5pOfReads12 ... fully extend only the 5p of the both read1 and read2, all other ends: local alignment |`string`, example: `"Local"` | -|`--alignEndsProtrude` |allow protrusion of alignment ends, i.e. start (end) of the +strand mate downstream of the start (end) of the -strand mate 1st word: int: maximum number of protrusion bases allowed 2nd word: string: - ConcordantPair ... report alignments with non-zero protrusion as concordant pairs - DiscordantPair ... report alignments with non-zero protrusion as discordant pairs |`string`, example: `"0 ConcordantPair"` | -|`--alignSoftClipAtReferenceEnds` |allow the soft-clipping of the alignments past the end of the chromosomes - Yes ... allow - No ... prohibit, useful for compatibility with Cufflinks |`string`, example: `"Yes"` | -|`--alignInsertionFlush` |how to flush ambiguous insertion positions - None ... insertions are not flushed - Right ... insertions are flushed to the right |`string` | +|Name |Description |Attributes | +|:------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------| +|`--seedSearchStartLmax` |defines the search start point through the read - the read is split into pieces no longer than this value |`integer`, example: `50` | +|`--seedSearchStartLmaxOverLread` |seedSearchStartLmax normalized to read length (sum of mates' lengths for paired-end reads) |`double`, example: `1` | +|`--seedSearchLmax` |defines the maximum length of the seeds, if =0 seed length is not limited |`integer`, example: `0` | +|`--seedMultimapNmax` |only pieces that map fewer than this value are utilized in the stitching procedure |`integer`, example: `10000` | +|`--seedPerReadNmax` |max number of seeds per read |`integer`, example: `1000` | +|`--seedPerWindowNmax` |max number of seeds per window |`integer`, example: `50` | +|`--seedNoneLociPerWindow` |max number of one seed loci per window |`integer`, example: `10` | +|`--seedSplitMin` |min length of the seed sequences split by Ns or mate gap |`integer`, example: `12` | +|`--seedMapMin` |min length of seeds to be mapped |`integer`, example: `5` | +|`--alignIntronMin` |minimum intron size, genomic gap is considered intron if its length>=alignIntronMin, otherwise it is considered Deletion |`integer`, example: `21` | +|`--alignIntronMax` |maximum intron size, if 0, max intron size will be determined by (2^winBinNbits)*winAnchorDistNbins |`integer`, example: `0` | +|`--alignMatesGapMax` |maximum gap between two mates, if 0, max intron gap will be determined by (2^winBinNbits)*winAnchorDistNbins |`integer`, example: `0` | +|`--alignSJoverhangMin` |minimum overhang (i.e. block size) for spliced alignments |`integer`, example: `5` | +|`--alignSJstitchMismatchNmax` |maximum number of mismatches for stitching of the splice junctions (-1: no limit). (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. |List of `integer`, example: `0, -1, 0, 0`, multiple_sep: `";"` | +|`--alignSJDBoverhangMin` |minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments |`integer`, example: `3` | +|`--alignSplicedMateMapLmin` |minimum mapped length for a read mate that is spliced |`integer`, example: `0` | +|`--alignSplicedMateMapLminOverLmate` |alignSplicedMateMapLmin normalized to mate length |`double`, example: `0.66` | +|`--alignWindowsPerReadNmax` |max number of windows per read |`integer`, example: `10000` | +|`--alignTranscriptsPerWindowNmax` |max number of transcripts per window |`integer`, example: `100` | +|`--alignTranscriptsPerReadNmax` |max number of different alignments per read to consider |`integer`, example: `10000` | +|`--alignEndsType` |type of read ends alignment - Local ... standard local alignment with soft-clipping allowed - EndToEnd ... force end-to-end read alignment, do not soft-clip - Extend5pOfRead1 ... fully extend only the 5p of the read1, all other ends: local alignment - Extend5pOfReads12 ... fully extend only the 5p of the both read1 and read2, all other ends: local alignment |`string`, example: `"Local"` | +|`--alignEndsProtrude` |allow protrusion of alignment ends, i.e. start (end) of the +strand mate downstream of the start (end) of the -strand mate 1st word: int: maximum number of protrusion bases allowed 2nd word: string: - ConcordantPair ... report alignments with non-zero protrusion as concordant pairs - DiscordantPair ... report alignments with non-zero protrusion as discordant pairs |`string`, example: `"0 ConcordantPair"` | +|`--alignSoftClipAtReferenceEnds` |allow the soft-clipping of the alignments past the end of the chromosomes - Yes ... allow - No ... prohibit, useful for compatibility with Cufflinks |`string`, example: `"Yes"` | +|`--alignInsertionFlush` |how to flush ambiguous insertion positions - None ... insertions are not flushed - Right ... insertions are flushed to the right |`string` | ### Paired-End reads @@ -319,29 +319,29 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Chimeric Alignments -|Name |Description |Attributes | -|:----------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------| -|`--chimOutType` |type of chimeric output - Junctions ... Chimeric.out.junction - SeparateSAMold ... output old SAM into separate Chimeric.out.sam file - WithinBAM ... output into main aligned BAM files (Aligned.*.bam) - WithinBAM HardClip ... (default) hard-clipping in the CIGAR for supplemental chimeric alignments (default if no 2nd word is present) - WithinBAM SoftClip ... soft-clipping in the CIGAR for supplemental chimeric alignments |`string`, example: `"Junctions"` | -|`--chimSegmentMin` |minimum length of chimeric segment length, if ==0, no chimeric output |`integer`, example: `0` | -|`--chimScoreMin` |minimum total (summed) score of the chimeric segments |`integer`, example: `0` | -|`--chimScoreDropMax` |max drop (difference) of chimeric score (the sum of scores of all chimeric segments) from the read length |`integer`, example: `20` | -|`--chimScoreSeparation` |minimum difference (separation) between the best chimeric score and the next one |`integer`, example: `10` | -|`--chimScoreJunctionNonGTAG` |penalty for a non-GT/AG chimeric junction |`integer`, example: `-1` | -|`--chimJunctionOverhangMin` |minimum overhang for a chimeric junction |`integer`, example: `20` | -|`--chimSegmentReadGapMax` |maximum gap in the read sequence between chimeric segments |`integer`, example: `0` | -|`--chimFilter` |different filters for chimeric alignments - None ... no filtering - banGenomicN ... Ns are not allowed in the genome sequence around the chimeric junction |`string`, example: `"banGenomicN"` | -|`--chimMainSegmentMultNmax` |maximum number of multi-alignments for the main chimeric segment. =1 will prohibit multimapping main segments. |`integer`, example: `10` | -|`--chimMultimapNmax` |maximum number of chimeric multi-alignments - 0 ... use the old scheme for chimeric detection which only considered unique alignments |`integer`, example: `0` | -|`--chimMultimapScoreRange` |the score range for multi-mapping chimeras below the best chimeric score. Only works with --chimMultimapNmax > 1 |`integer`, example: `1` | -|`--chimNonchimScoreDropMin` |to trigger chimeric detection, the drop in the best non-chimeric alignment score with respect to the read length has to be greater than this value |`integer`, example: `20` | -|`--chimOutJunctionFormat` |formatting type for the Chimeric.out.junction file - 0 ... no comment lines/headers - 1 ... comment lines at the end of the file: command line and Nreads: total, unique/multi-mapping |`integer`, example: `0` | +|Name |Description |Attributes | +|:----------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------| +|`--chimOutType` |type of chimeric output - Junctions ... Chimeric.out.junction - SeparateSAMold ... output old SAM into separate Chimeric.out.sam file - WithinBAM ... output into main aligned BAM files (Aligned.*.bam) - WithinBAM HardClip ... (default) hard-clipping in the CIGAR for supplemental chimeric alignments (default if no 2nd word is present) - WithinBAM SoftClip ... soft-clipping in the CIGAR for supplemental chimeric alignments |List of `string`, example: `"Junctions"`, multiple_sep: `";"` | +|`--chimSegmentMin` |minimum length of chimeric segment length, if ==0, no chimeric output |`integer`, example: `0` | +|`--chimScoreMin` |minimum total (summed) score of the chimeric segments |`integer`, example: `0` | +|`--chimScoreDropMax` |max drop (difference) of chimeric score (the sum of scores of all chimeric segments) from the read length |`integer`, example: `20` | +|`--chimScoreSeparation` |minimum difference (separation) between the best chimeric score and the next one |`integer`, example: `10` | +|`--chimScoreJunctionNonGTAG` |penalty for a non-GT/AG chimeric junction |`integer`, example: `-1` | +|`--chimJunctionOverhangMin` |minimum overhang for a chimeric junction |`integer`, example: `20` | +|`--chimSegmentReadGapMax` |maximum gap in the read sequence between chimeric segments |`integer`, example: `0` | +|`--chimFilter` |different filters for chimeric alignments - None ... no filtering - banGenomicN ... Ns are not allowed in the genome sequence around the chimeric junction |List of `string`, example: `"banGenomicN"`, multiple_sep: `";"` | +|`--chimMainSegmentMultNmax` |maximum number of multi-alignments for the main chimeric segment. =1 will prohibit multimapping main segments. |`integer`, example: `10` | +|`--chimMultimapNmax` |maximum number of chimeric multi-alignments - 0 ... use the old scheme for chimeric detection which only considered unique alignments |`integer`, example: `0` | +|`--chimMultimapScoreRange` |the score range for multi-mapping chimeras below the best chimeric score. Only works with --chimMultimapNmax > 1 |`integer`, example: `1` | +|`--chimNonchimScoreDropMin` |to trigger chimeric detection, the drop in the best non-chimeric alignment score with respect to the read length has to be greater than this value |`integer`, example: `20` | +|`--chimOutJunctionFormat` |formatting type for the Chimeric.out.junction file - 0 ... no comment lines/headers - 1 ... comment lines at the end of the file: command line and Nreads: total, unique/multi-mapping |`integer`, example: `0` | ### Quantification of Annotations |Name |Description |Attributes | |:------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------| -|`--quantMode` |types of quantification requested - - ... none - TranscriptomeSAM ... output SAM/BAM alignments to transcriptome into a separate file - GeneCounts ... count reads per gene |`string` | +|`--quantMode` |types of quantification requested - - ... none - TranscriptomeSAM ... output SAM/BAM alignments to transcriptome into a separate file - GeneCounts ... count reads per gene |List of `string`, multiple_sep: `";"` | |`--quantTranscriptomeBAMcompression` |-2 to 10 transcriptome BAM compression level - -2 ... no BAM output - -1 ... default compression (6?) - 0 ... no compression - 10 ... maximum compression |`integer`, example: `1` | |`--quantTranscriptomeBan` |prohibit various alignment type - IndelSoftclipSingleend ... prohibit indels, soft clipping and single-end alignments - compatible with RSEM - Singleend ... prohibit single-end alignments |`string`, example: `"IndelSoftclipSingleend"` | @@ -363,32 +363,32 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### STARsolo (single cell RNA-seq) parameters -|Name |Description |Attributes | -|:-----------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------| -|`--soloType` |type of single-cell RNA-seq - CB_UMI_Simple ... (a.k.a. Droplet) one UMI and one Cell Barcode of fixed length in read2, e.g. Drop-seq and 10X Chromium. - CB_UMI_Complex ... multiple Cell Barcodes of varying length, one UMI of fixed length and one adapter sequence of fixed length are allowed in read2 only (e.g. inDrop, ddSeq). - CB_samTagOut ... output Cell Barcode as CR and/or CB SAm tag. No UMI counting. --readFilesIn cDNA_read1 [cDNA_read2 if paired-end] CellBarcode_read . Requires --outSAMtype BAM Unsorted [and/or SortedByCoordinate] - SmartSeq ... Smart-seq: each cell in a separate FASTQ (paired- or single-end), barcodes are corresponding read-groups, no UMI sequences, alignments deduplicated according to alignment start and end (after extending soft-clipped bases) |`string` | -|`--soloCBwhitelist` |file(s) with whitelist(s) of cell barcodes. Only --soloType CB_UMI_Complex allows more than one whitelist file. - None ... no whitelist: all cell barcodes are allowed |`string` | -|`--soloCBstart` |cell barcode start base |`integer`, example: `1` | -|`--soloCBlen` |cell barcode length |`integer`, example: `16` | -|`--soloUMIstart` |UMI start base |`integer`, example: `17` | -|`--soloUMIlen` |UMI length |`integer`, example: `10` | -|`--soloBarcodeReadLength` |length of the barcode read - 1 ... equal to sum of soloCBlen+soloUMIlen - 0 ... not defined, do not check |`integer`, example: `1` | -|`--soloBarcodeMate` |identifies which read mate contains the barcode (CB+UMI) sequence - 0 ... barcode sequence is on separate read, which should always be the last file in the --readFilesIn listed - 1 ... barcode sequence is a part of mate 1 - 2 ... barcode sequence is a part of mate 2 |`integer`, example: `0` | -|`--soloCBposition` |position of Cell Barcode(s) on the barcode read. Presently only works with --soloType CB_UMI_Complex, and barcodes are assumed to be on Read2. Format for each barcode: startAnchor_startPosition_endAnchor_endPosition start(end)Anchor defines the Anchor Base for the CB: 0: read start; 1: read end; 2: adapter start; 3: adapter end start(end)Position is the 0-based position with of the CB start(end) with respect to the Anchor Base String for different barcodes are separated by space. Example: inDrop (Zilionis et al, Nat. Protocols, 2017): --soloCBposition 0_0_2_-1 3_1_3_8 |`string` | -|`--soloUMIposition` |position of the UMI on the barcode read, same as soloCBposition Example: inDrop (Zilionis et al, Nat. Protocols, 2017): --soloCBposition 3_9_3_14 |`string` | -|`--soloAdapterSequence` |adapter sequence to anchor barcodes. Only one adapter sequence is allowed. |`string` | -|`--soloAdapterMismatchesNmax` |maximum number of mismatches allowed in adapter sequence. |`integer`, example: `1` | -|`--soloCBmatchWLtype` |matching the Cell Barcodes to the WhiteList - Exact ... only exact matches allowed - 1MM ... only one match in whitelist with 1 mismatched base allowed. Allowed CBs have to have at least one read with exact match. - 1MM_multi ... multiple matches in whitelist with 1 mismatched base allowed, posterior probability calculation is used choose one of the matches. Allowed CBs have to have at least one read with exact match. This option matches best with CellRanger 2.2.0 - 1MM_multi_pseudocounts ... same as 1MM_Multi, but pseudocounts of 1 are added to all whitelist barcodes. - 1MM_multi_Nbase_pseudocounts ... same as 1MM_multi_pseudocounts, multimatching to WL is allowed for CBs with N-bases. This option matches best with CellRanger >= 3.0.0 - EditDist_2 ... allow up to edit distance of 3 fpr each of the barcodes. May include one deletion + one insertion. Only works with --soloType CB_UMI_Complex. Matches to multiple passlist barcdoes are not allowed. Similar to ParseBio Split-seq pipeline. |`string`, example: `"1MM_multi"` | -|`--soloInputSAMattrBarcodeSeq` |when inputting reads from a SAM file (--readsFileType SAM SE/PE), these SAM attributes mark the barcode sequence (in proper order). For instance, for 10X CellRanger or STARsolo BAMs, use --soloInputSAMattrBarcodeSeq CR UR . This parameter is required when running STARsolo with input from SAM. |`string` | -|`--soloInputSAMattrBarcodeQual` |when inputting reads from a SAM file (--readsFileType SAM SE/PE), these SAM attributes mark the barcode qualities (in proper order). For instance, for 10X CellRanger or STARsolo BAMs, use --soloInputSAMattrBarcodeQual CY UY . If this parameter is '-' (default), the quality 'H' will be assigned to all bases. |`string` | -|`--soloStrand` |strandedness of the solo libraries: - Unstranded ... no strand information - Forward ... read strand same as the original RNA molecule - Reverse ... read strand opposite to the original RNA molecule |`string`, example: `"Forward"` | -|`--soloFeatures` |genomic features for which the UMI counts per Cell Barcode are collected - Gene ... genes: reads match the gene transcript - SJ ... splice junctions: reported in SJ.out.tab - GeneFull ... full gene (pre-mRNA): count all reads overlapping genes' exons and introns - GeneFull_ExonOverIntron ... full gene (pre-mRNA): count all reads overlapping genes' exons and introns: prioritize 100% overlap with exons - GeneFull_Ex50pAS ... full gene (pre-RNA): count all reads overlapping genes' exons and introns: prioritize >50% overlap with exons. Do not count reads with 100% exonic overlap in the antisense direction. |`string`, example: `"Gene"` | -|`--soloMultiMappers` |counting method for reads mapping to multiple genes - Unique ... count only reads that map to unique genes - Uniform ... uniformly distribute multi-genic UMIs to all genes - Rescue ... distribute UMIs proportionally to unique+uniform counts (~ first iteration of EM) - PropUnique ... distribute UMIs proportionally to unique mappers, if present, and uniformly if not. - EM ... multi-gene UMIs are distributed using Expectation Maximization algorithm |`string`, example: `"Unique"` | -|`--soloUMIdedup` |type of UMI deduplication (collapsing) algorithm - 1MM_All ... all UMIs with 1 mismatch distance to each other are collapsed (i.e. counted once). - 1MM_Directional_UMItools ... follows the "directional" method from the UMI-tools by Smith, Heger and Sudbery (Genome Research 2017). - 1MM_Directional ... same as 1MM_Directional_UMItools, but with more stringent criteria for duplicate UMIs - Exact ... only exactly matching UMIs are collapsed. - NoDedup ... no deduplication of UMIs, count all reads. - 1MM_CR ... CellRanger2-4 algorithm for 1MM UMI collapsing. |`string`, example: `"1MM_All"` | -|`--soloUMIfiltering` |type of UMI filtering (for reads uniquely mapping to genes) - - ... basic filtering: remove UMIs with N and homopolymers (similar to CellRanger 2.2.0). - MultiGeneUMI ... basic + remove lower-count UMIs that map to more than one gene. - MultiGeneUMI_All ... basic + remove all UMIs that map to more than one gene. - MultiGeneUMI_CR ... basic + remove lower-count UMIs that map to more than one gene, matching CellRanger > 3.0.0 . Only works with --soloUMIdedup 1MM_CR |`string` | -|`--soloOutFileNames` |file names for STARsolo output: file_name_prefix gene_names barcode_sequences cell_feature_count_matrix |`string`, example: `"Solo.out/"`, example: `"features.tsv"`, example: `"barcodes.tsv"`, example: `"matrix.mtx"` | -|`--soloCellFilter` |cell filtering type and parameters - None ... do not output filtered cells - TopCells ... only report top cells by UMI count, followed by the exact number of cells - CellRanger2.2 ... simple filtering of CellRanger 2.2. Can be followed by numbers: number of expected cells, robust maximum percentile for UMI count, maximum to minimum ratio for UMI count The harcoded values are from CellRanger: nExpectedCells=3000; maxPercentile=0.99; maxMinRatio=10 - EmptyDrops_CR ... EmptyDrops filtering in CellRanger flavor. Please cite the original EmptyDrops paper: A.T.L Lun et al, Genome Biology, 20, 63 (2019): https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1662-y Can be followed by 10 numeric parameters: nExpectedCells maxPercentile maxMinRatio indMin indMax umiMin umiMinFracMedian candMaxN FDR simN The harcoded values are from CellRanger: 3000 0.99 10 45000 90000 500 0.01 20000 0.01 10000 |`string`, example: `"CellRanger2.2"`, example: `"3000"`, example: `"0.99"`, example: `"10"` | -|`--soloOutFormatFeaturesGeneField3` |field 3 in the Gene features.tsv file. If "-", then no 3rd field is output. |`string`, example: `"Gene Expression"` | -|`--soloCellReadStats` |Output reads statistics for each CB - Standard ... standard output |`string` | +|Name |Description |Attributes | +|:-----------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------| +|`--soloType` |type of single-cell RNA-seq - CB_UMI_Simple ... (a.k.a. Droplet) one UMI and one Cell Barcode of fixed length in read2, e.g. Drop-seq and 10X Chromium. - CB_UMI_Complex ... multiple Cell Barcodes of varying length, one UMI of fixed length and one adapter sequence of fixed length are allowed in read2 only (e.g. inDrop, ddSeq). - CB_samTagOut ... output Cell Barcode as CR and/or CB SAm tag. No UMI counting. --readFilesIn cDNA_read1 [cDNA_read2 if paired-end] CellBarcode_read . Requires --outSAMtype BAM Unsorted [and/or SortedByCoordinate] - SmartSeq ... Smart-seq: each cell in a separate FASTQ (paired- or single-end), barcodes are corresponding read-groups, no UMI sequences, alignments deduplicated according to alignment start and end (after extending soft-clipped bases) |List of `string`, multiple_sep: `";"` | +|`--soloCBwhitelist` |file(s) with whitelist(s) of cell barcodes. Only --soloType CB_UMI_Complex allows more than one whitelist file. - None ... no whitelist: all cell barcodes are allowed |List of `string`, multiple_sep: `";"` | +|`--soloCBstart` |cell barcode start base |`integer`, example: `1` | +|`--soloCBlen` |cell barcode length |`integer`, example: `16` | +|`--soloUMIstart` |UMI start base |`integer`, example: `17` | +|`--soloUMIlen` |UMI length |`integer`, example: `10` | +|`--soloBarcodeReadLength` |length of the barcode read - 1 ... equal to sum of soloCBlen+soloUMIlen - 0 ... not defined, do not check |`integer`, example: `1` | +|`--soloBarcodeMate` |identifies which read mate contains the barcode (CB+UMI) sequence - 0 ... barcode sequence is on separate read, which should always be the last file in the --readFilesIn listed - 1 ... barcode sequence is a part of mate 1 - 2 ... barcode sequence is a part of mate 2 |`integer`, example: `0` | +|`--soloCBposition` |position of Cell Barcode(s) on the barcode read. Presently only works with --soloType CB_UMI_Complex, and barcodes are assumed to be on Read2. Format for each barcode: startAnchor_startPosition_endAnchor_endPosition start(end)Anchor defines the Anchor Base for the CB: 0: read start; 1: read end; 2: adapter start; 3: adapter end start(end)Position is the 0-based position with of the CB start(end) with respect to the Anchor Base String for different barcodes are separated by space. Example: inDrop (Zilionis et al, Nat. Protocols, 2017): --soloCBposition 0_0_2_-1 3_1_3_8 |List of `string`, multiple_sep: `";"` | +|`--soloUMIposition` |position of the UMI on the barcode read, same as soloCBposition Example: inDrop (Zilionis et al, Nat. Protocols, 2017): --soloCBposition 3_9_3_14 |`string` | +|`--soloAdapterSequence` |adapter sequence to anchor barcodes. Only one adapter sequence is allowed. |`string` | +|`--soloAdapterMismatchesNmax` |maximum number of mismatches allowed in adapter sequence. |`integer`, example: `1` | +|`--soloCBmatchWLtype` |matching the Cell Barcodes to the WhiteList - Exact ... only exact matches allowed - 1MM ... only one match in whitelist with 1 mismatched base allowed. Allowed CBs have to have at least one read with exact match. - 1MM_multi ... multiple matches in whitelist with 1 mismatched base allowed, posterior probability calculation is used choose one of the matches. Allowed CBs have to have at least one read with exact match. This option matches best with CellRanger 2.2.0 - 1MM_multi_pseudocounts ... same as 1MM_Multi, but pseudocounts of 1 are added to all whitelist barcodes. - 1MM_multi_Nbase_pseudocounts ... same as 1MM_multi_pseudocounts, multimatching to WL is allowed for CBs with N-bases. This option matches best with CellRanger >= 3.0.0 - EditDist_2 ... allow up to edit distance of 3 fpr each of the barcodes. May include one deletion + one insertion. Only works with --soloType CB_UMI_Complex. Matches to multiple passlist barcdoes are not allowed. Similar to ParseBio Split-seq pipeline. |`string`, example: `"1MM_multi"` | +|`--soloInputSAMattrBarcodeSeq` |when inputting reads from a SAM file (--readsFileType SAM SE/PE), these SAM attributes mark the barcode sequence (in proper order). For instance, for 10X CellRanger or STARsolo BAMs, use --soloInputSAMattrBarcodeSeq CR UR . This parameter is required when running STARsolo with input from SAM. |List of `string`, multiple_sep: `";"` | +|`--soloInputSAMattrBarcodeQual` |when inputting reads from a SAM file (--readsFileType SAM SE/PE), these SAM attributes mark the barcode qualities (in proper order). For instance, for 10X CellRanger or STARsolo BAMs, use --soloInputSAMattrBarcodeQual CY UY . If this parameter is '-' (default), the quality 'H' will be assigned to all bases. |List of `string`, multiple_sep: `";"` | +|`--soloStrand` |strandedness of the solo libraries: - Unstranded ... no strand information - Forward ... read strand same as the original RNA molecule - Reverse ... read strand opposite to the original RNA molecule |`string`, example: `"Forward"` | +|`--soloFeatures` |genomic features for which the UMI counts per Cell Barcode are collected - Gene ... genes: reads match the gene transcript - SJ ... splice junctions: reported in SJ.out.tab - GeneFull ... full gene (pre-mRNA): count all reads overlapping genes' exons and introns - GeneFull_ExonOverIntron ... full gene (pre-mRNA): count all reads overlapping genes' exons and introns: prioritize 100% overlap with exons - GeneFull_Ex50pAS ... full gene (pre-RNA): count all reads overlapping genes' exons and introns: prioritize >50% overlap with exons. Do not count reads with 100% exonic overlap in the antisense direction. |List of `string`, example: `"Gene"`, multiple_sep: `";"` | +|`--soloMultiMappers` |counting method for reads mapping to multiple genes - Unique ... count only reads that map to unique genes - Uniform ... uniformly distribute multi-genic UMIs to all genes - Rescue ... distribute UMIs proportionally to unique+uniform counts (~ first iteration of EM) - PropUnique ... distribute UMIs proportionally to unique mappers, if present, and uniformly if not. - EM ... multi-gene UMIs are distributed using Expectation Maximization algorithm |List of `string`, example: `"Unique"`, multiple_sep: `";"` | +|`--soloUMIdedup` |type of UMI deduplication (collapsing) algorithm - 1MM_All ... all UMIs with 1 mismatch distance to each other are collapsed (i.e. counted once). - 1MM_Directional_UMItools ... follows the "directional" method from the UMI-tools by Smith, Heger and Sudbery (Genome Research 2017). - 1MM_Directional ... same as 1MM_Directional_UMItools, but with more stringent criteria for duplicate UMIs - Exact ... only exactly matching UMIs are collapsed. - NoDedup ... no deduplication of UMIs, count all reads. - 1MM_CR ... CellRanger2-4 algorithm for 1MM UMI collapsing. |List of `string`, example: `"1MM_All"`, multiple_sep: `";"` | +|`--soloUMIfiltering` |type of UMI filtering (for reads uniquely mapping to genes) - - ... basic filtering: remove UMIs with N and homopolymers (similar to CellRanger 2.2.0). - MultiGeneUMI ... basic + remove lower-count UMIs that map to more than one gene. - MultiGeneUMI_All ... basic + remove all UMIs that map to more than one gene. - MultiGeneUMI_CR ... basic + remove lower-count UMIs that map to more than one gene, matching CellRanger > 3.0.0 . Only works with --soloUMIdedup 1MM_CR |List of `string`, multiple_sep: `";"` | +|`--soloOutFileNames` |file names for STARsolo output: file_name_prefix gene_names barcode_sequences cell_feature_count_matrix |List of `string`, example: `"Solo.out/", "features.tsv", "barcodes.tsv", "matrix.mtx"`, multiple_sep: `";"` | +|`--soloCellFilter` |cell filtering type and parameters - None ... do not output filtered cells - TopCells ... only report top cells by UMI count, followed by the exact number of cells - CellRanger2.2 ... simple filtering of CellRanger 2.2. Can be followed by numbers: number of expected cells, robust maximum percentile for UMI count, maximum to minimum ratio for UMI count The harcoded values are from CellRanger: nExpectedCells=3000; maxPercentile=0.99; maxMinRatio=10 - EmptyDrops_CR ... EmptyDrops filtering in CellRanger flavor. Please cite the original EmptyDrops paper: A.T.L Lun et al, Genome Biology, 20, 63 (2019): https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1662-y Can be followed by 10 numeric parameters: nExpectedCells maxPercentile maxMinRatio indMin indMax umiMin umiMinFracMedian candMaxN FDR simN The harcoded values are from CellRanger: 3000 0.99 10 45000 90000 500 0.01 20000 0.01 10000 |List of `string`, example: `"CellRanger2.2", "3000", "0.99", "10"`, multiple_sep: `";"` | +|`--soloOutFormatFeaturesGeneField3` |field 3 in the Gene features.tsv file. If "-", then no 3rd field is output. |List of `string`, example: `"Gene Expression"`, multiple_sep: `";"` | +|`--soloCellReadStats` |Output reads statistics for each CB - Standard ... standard output |`string` | ## Authors diff --git a/components/modules/mapping/star_build_reference.qmd b/components/modules/mapping/star_build_reference.qmd index 0f69ee2a..4ebdc30d 100644 --- a/components/modules/mapping/star_build_reference.qmd +++ b/components/modules/mapping/star_build_reference.qmd @@ -70,11 +70,11 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Input/Output -|Name |Description |Attributes | -|:---------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------| -|`--genome_fasta` |The fasta files to be included in the reference. Corresponds to the --genomeFastaFiles argument in the STAR command. |`file`, required, example: `"chr1.fasta"`, example: `"chr2.fasta"` | -|`--transcriptome_gtf` |Specifies the path to the file with annotated transcripts in the standard GTF format. STAR will extract splice junctions from this file and use them to greatly improve accuracy of the mapping. Corresponds to the --sjdbGTFfile argument in the STAR command. |`file` | -|`--output` |Path to output directory. Corresponds to the --genomeDir argument in the STAR command. |`file`, required, example: `"/path/to/foo"` | +|Name |Description |Attributes | +|:---------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| +|`--genome_fasta` |The fasta files to be included in the reference. Corresponds to the --genomeFastaFiles argument in the STAR command. |List of `file`, required, example: `"chr1.fasta", "chr2.fasta"`, multiple_sep: `" "` | +|`--transcriptome_gtf` |Specifies the path to the file with annotated transcripts in the standard GTF format. STAR will extract splice junctions from this file and use them to greatly improve accuracy of the mapping. Corresponds to the --sjdbGTFfile argument in the STAR command. |`file` | +|`--output` |Path to output directory. Corresponds to the --genomeDir argument in the STAR command. |`file`, required, example: `"/path/to/foo"` | ### Genome indexing arguments diff --git a/components/modules/process_10xh5/filter_10xh5.qmd b/components/modules/process_10xh5/filter_10xh5.qmd index 4a4fef78..9bd8a116 100644 --- a/components/modules/process_10xh5/filter_10xh5.qmd +++ b/components/modules/process_10xh5/filter_10xh5.qmd @@ -76,7 +76,7 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen |`--output` |Output h5 file. |`file`, required, example: `"pbmc_1k_protein_v3_raw_feature_bc_matrix_filtered.h5"` | |`--min_library_size` |Minimum library size. |`integer`, default: `0` | |`--min_cells_per_gene` |Minimum number of cells per gene. |`integer`, default: `0` | -|`--keep_feature_types` |Specify which feature types will never be filtered out |`string`, example: `"Antibody Capture"` | +|`--keep_feature_types` |Specify which feature types will never be filtered out |List of `string`, example: `"Antibody Capture"`, multiple_sep: `":"` | |`--verbose` |Increase verbosity |`boolean_true` | ## Authors diff --git a/components/modules/qc/calculate_qc_metrics.qmd b/components/modules/qc/calculate_qc_metrics.qmd index 1637ffcd..7fb7ccaf 100644 --- a/components/modules/qc/calculate_qc_metrics.qmd +++ b/components/modules/qc/calculate_qc_metrics.qmd @@ -90,14 +90,14 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Inputs -|Name |Description |Attributes | -|:--------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------| -|`--input` |Input h5mu file |`file`, required, example: `"input.h5mu"` | -|`--modality` | |`string`, default: `"rna"` | -|`--layer` | |`string`, example: `"raw_counts"` | -|`--var_qc_metrics` |Keys to select a boolean (containing only True or False) column from .var. For each cell, calculate the proportion of total values for genes which are labeled 'True', compared to the total sum of the values for all genes. |`string`, example: `"ercc,highly_variable,mitochondrial"` | -|`--var_qc_metrics_fill_na_value` |Fill any 'NA' values found in the columns specified with --var_qc_metrics to 'True' or 'False'. as False. |`boolean` | -|`--top_n_vars` |Number of top vars to be used to calculate cumulative proportions. If not specified, proportions are not calculated. `--top_n_vars 20,50` finds cumulative proportion to the 20th and 50th most expressed vars. |`integer` | +|Name |Description |Attributes | +|:--------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------| +|`--input` |Input h5mu file |`file`, required, example: `"input.h5mu"` | +|`--modality` | |`string`, default: `"rna"` | +|`--layer` | |`string`, example: `"raw_counts"` | +|`--var_qc_metrics` |Keys to select a boolean (containing only True or False) column from .var. For each cell, calculate the proportion of total values for genes which are labeled 'True', compared to the total sum of the values for all genes. |List of `string`, example: `"ercc,highly_variable,mitochondrial"`, multiple_sep: `","` | +|`--var_qc_metrics_fill_na_value` |Fill any 'NA' values found in the columns specified with --var_qc_metrics to 'True' or 'False'. as False. |`boolean` | +|`--top_n_vars` |Number of top vars to be used to calculate cumulative proportions. If not specified, proportions are not calculated. `--top_n_vars 20,50` finds cumulative proportion to the 20th and 50th most expressed vars. |List of `integer`, multiple_sep: `","` | ### Outputs diff --git a/components/modules/qc/multiqc.qmd b/components/modules/qc/multiqc.qmd index 2e99104d..a11971e3 100644 --- a/components/modules/qc/multiqc.qmd +++ b/components/modules/qc/multiqc.qmd @@ -66,7 +66,7 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Arguments -|Name |Description |Attributes | -|:----------|:------------------------------------------------|:----------------------------------------| -|`--input` |Inputs for MultiQC. |`file`, required, example: `"input.txt"` | -|`--output` |Create report in the specified output directory. |`file`, required, example: `"report"` | +|Name |Description |Attributes | +|:----------|:------------------------------------------------|:---------------------------------------------------------------------| +|`--input` |Inputs for MultiQC. |List of `file`, required, example: `"input.txt"`, multiple_sep: `":"` | +|`--output` |Create report in the specified output directory. |`file`, required, example: `"report"` | diff --git a/components/modules/query/cellxgene_census.qmd b/components/modules/query/cellxgene_census.qmd index 9155f719..1e6c350d 100644 --- a/components/modules/query/cellxgene_census.qmd +++ b/components/modules/query/cellxgene_census.qmd @@ -96,7 +96,7 @@ Arguments related to the query. |:----------------------------|:----------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------| |`--species` |Specie(s) of interest. If not specified, Homo Sapiens will be queried. |`string`, default: `"homo_sapiens"`, example: `"homo_sapiens"` | |`--cell_query` |The query for selecting the cells as defined by the cellxgene census schema. |`string`, example: `"is_primary_data == True and cell_type_ontology_term_id in ['CL:0000136', 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'"` | -|`--cells_filter_columns` |The query for selecting the cells as defined by the cellxgene census schema. |`string`, example: `"dataset_id"`, example: `"tissue"`, example: `"assay"`, example: `"disease"`, example: `"cell_type"` | +|`--cells_filter_columns` |The query for selecting the cells as defined by the cellxgene census schema. |List of `string`, example: `"dataset_id", "tissue", "assay", "disease", "cell_type"`, multiple_sep: `":"` | |`--min_cells_filter_columns` |Minimum of amount of summed cells_filter_columns cells |`double`, example: `100` | diff --git a/components/modules/transform/delete_layer.qmd b/components/modules/transform/delete_layer.qmd index 78ff199f..302c567e 100644 --- a/components/modules/transform/delete_layer.qmd +++ b/components/modules/transform/delete_layer.qmd @@ -70,14 +70,14 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen ### Arguments -|Name |Description |Attributes | -|:----------------------|:---------------------------------------------------------------------|:------------------------------------------| -|`--input` |Input h5mu file |`file`, required, example: `"input.h5mu"` | -|`--modality` | |`string`, default: `"rna"` | -|`--layer` |Input layer to remove |`string`, required | -|`--output` |Output h5mu file. |`file`, required, default: `"output.h5mu"` | -|`--output_compression` |The compression format to be used on the output h5mu object. |`string`, example: `"gzip"` | -|`--missing_ok` |Do not raise an error if the layer does not exist for all modalities. |`boolean_true` | +|Name |Description |Attributes | +|:----------------------|:---------------------------------------------------------------------|:-----------------------------------------------| +|`--input` |Input h5mu file |`file`, required, example: `"input.h5mu"` | +|`--modality` | |`string`, default: `"rna"` | +|`--layer` |Input layer to remove |List of `string`, required, multiple_sep: `":"` | +|`--output` |Output h5mu file. |`file`, required, default: `"output.h5mu"` | +|`--output_compression` |The compression format to be used on the output h5mu object. |`string`, example: `"gzip"` | +|`--missing_ok` |Do not raise an error if the layer does not exist for all modalities. |`boolean_true` | ## Authors diff --git a/components/modules/transform/regress_out.qmd b/components/modules/transform/regress_out.qmd index 55e89de8..33c5b68e 100644 --- a/components/modules/transform/regress_out.qmd +++ b/components/modules/transform/regress_out.qmd @@ -76,7 +76,7 @@ Replace `-profile docker` with `-profile podman` or `-profile singularity` depen |`--output` |Output h5mu file. |`file`, required, default: `"output.h5mu"` | |`--output_compression` |The compression format to be used on the output h5mu object. |`string`, example: `"gzip"` | |`--modality` |Which modality (one or more) to run this component on. |`string`, default: `"rna"` | -|`--obs_keys` |Which .obs keys to regress on. |`string` | +|`--obs_keys` |Which .obs keys to regress on. |List of `string`, multiple_sep: `":"` | ## Authors