Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cellranger multi => MTX_TO_H5AD: input file name collision #383

Open
nick-youngblut opened this issue Oct 17, 2024 · 2 comments
Open

cellranger multi => MTX_TO_H5AD: input file name collision #383

nick-youngblut opened this issue Oct 17, 2024 · 2 comments
Labels
bug Something isn't working cellranger-multi

Comments

@nick-youngblut
Copy link
Contributor

Description of the bug

Running CellRanger with all GEX samples, in which there are multiple barcodes per sample, but the all go to the same sample (see the samples & barcodes tables below).
This results in a file name collision at the MTX_TO_H5AD step.
I haven't been able to determine why, based on the pipeline code.

Command used and terminal output

The command:

nextflow run main.nf \
  -ansi-log false \
  -profile singularity \
  -process.executor slurm \
  -process.queue cpu_batch \
  -work-dir /scratch/$(id -gn)/$(whoami)/nextflow-work/scrnaseq \
  --aligner cellrangermulti \
  --skip_cellrangermulti_vdjref \
  --skip_emptydrops \
  --gex_frna_probe_set ${PROBE_REF_DIR}/Chromium_Human_Transcriptome_Probe_Set_v1.0.1_GRCh38-2020-A.csv \
  --cellranger_index ${GENOME_REF_DIR}/refdata-gex-GRCh38-2020-A/ \
  --cellranger_multi_barcodes tmp/sample_barcodes.csv \
  --input tmp/samples.csv \
  --outdir tmp/scrnaseq_output
ERROR ~ Error executing process > 'NFCORE_SCRNASEQ:SCRNASEQ:MTX_CONVERSION:MTX_TO_H5AD (2)'

Caused by:
  Process `NFCORE_SCRNASEQ:SCRNASEQ:MTX_CONVERSION:MTX_TO_H5AD` input file name collision -- There are multiple input files for each of the following file names: barcodes.tsv.gz, features.tsv.gz, matrix.mtx.gz

Relevant files

The samples table (full paths removed for clarity):

sample,fastq_1,fastq_2,feature_type
20240905_ADI_batch3_flex_1,20240905_ADI_batch3_flex_1_S1_L001_R1_001.fastq.gz,20240905_ADI_batch3_flex_1_S1_L001_R2_001.fastq.gz,gex
20240905_ADI_batch3_flex_2,20240905_ADI_batch3_flex_2_S1_L001_R1_001.fastq.gz,20240905_ADI_batch3_flex_2_S1_L001_R2_001.fastq.gz,gex
20240925_ADI_batch5_flex_1,/20240925_ADI_batch5_flex_1_R1_001.fastq.gz,/20240925_ADI_batch5_flex_1_R2_001.fastq.gz,gex
20240925_ADI_batch5_flex_2,/20240925_ADI_batch5_flex_2_R1_001.fastq.gz,/20240925_ADI_batch5_flex_2_R2_001.fastq.gz,gex
20240925_ADI_batch5_flex_3,/20240925_ADI_batch5_flex_3_R1_001.fastq.gz,/20240925_ADI_batch5_flex_3_R2_001.fastq.gz,gex
20240925_ADI_batch5_flex_4,/20240925_ADI_batch5_flex_4_R1_001.fastq.gz,/20240925_ADI_batch5_flex_4_R2_001.fastq.gz,gex

The sample barcodes table:

sample,multiplexed_sample_id,probe_barcode_ids,cmo_ids,description
20240905_ADI_batch3_flex_1,20240905_ADI_batch3_flex_1,BC001|BC002|BC003|BC004,,
20240905_ADI_batch3_flex_2,20240905_ADI_batch3_flex_2,BC001|BC002|BC003|BC004,,
20240925_ADI_batch5_flex_1,20240925_ADI_batch5_flex_1,BC001|BC002|BC003|BC004,,
20240925_ADI_batch5_flex_2,20240925_ADI_batch5_flex_2,BC001|BC002|BC003|BC004,,
20240925_ADI_batch5_flex_3,20240925_ADI_batch5_flex_3,BC001|BC002|BC003|BC004,,
20240925_ADI_batch5_flex_4,20240925_ADI_batch5_flex_4,BC001|BC002|BC003|BC004,,

System information

Nextflow: 24.04.4.5917
Hardward: HPC
Executor: SLURM
Engine: Apptainer
OS: Ubuntu
Pipeline: 2.7.1

@nick-youngblut nick-youngblut added the bug Something isn't working label Oct 17, 2024
@nick-youngblut
Copy link
Contributor Author

Adding mtx_matrices.view() to MTX_CONVERSION shows that all of the samples have the same file names, which is causing the name collision:

[
  [id:20240925_ADI_batch5_flex_3, ...],
  [
    /path/to/sample1/barcodes.tsv.gz,
    /path/to/sample1/features.tsv.gz,
    /path/to/sample1/matrix.mtx.gz,
    ...
  ]
]

@nick-youngblut
Copy link
Contributor Author

It might help to include info on how to handle multiple barcodes per sample in the sample barcode table: https://nf-co.re/scrnaseq/2.7.1/docs/usage ("Additional samplesheet for multiplexed samples").

For example:

sample,multiplexed_sample_id,probe_barcode_ids,cmo_ids,description
20240905_ADI_batch3_flex_1,20240905_ADI_batch3_flex_1,BC001|BC002|BC003|BC004,,

From the 10X docs:

If multiple Probe Barcodes were used for a sample, separate IDs with a pipe (e.g., BC001|BC002).

@grst grst added this to scrnaseq Nov 7, 2024
@github-project-automation github-project-automation bot moved this to Todo high priority in scrnaseq Nov 7, 2024
@grst grst moved this from Todo high priority to Todo - medium priority in scrnaseq Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cellranger-multi
Projects
Status: Todo - medium priority
Development

No branches or pull requests

2 participants