Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filenames of output file? #529

Closed
Thomieh73 opened this issue Sep 12, 2024 · 4 comments
Closed

Filenames of output file? #529

Thomieh73 opened this issue Sep 12, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@Thomieh73
Copy link

Description of the bug

Hi,
I ran Taxprofiler and I notice that my sample names are changed in a way that I find a bit odd.

This is original file name as I give it in the samplesheet.

DNA_H1H_17_A5_1.fq.gz
DNA_H1H_17_A5_2.fq.gz

and it becomes:

DNA_H1H_17_A5_DNA_H1H_17.unmapped_1.fastq.gz
DNA_H1H_17_A5_DNA_H1H_17.unmapped_2.fastq.gz

Where does the repeat "DNA_H1H_17" come from.

This was my command to run the job:

nextflow run nf-core/taxprofiler -r 1.1.8 -profile apptainer \
-c saga_taxprofiler.config -work-dir $USERWORK/taxprofiler -resume \
--input taxprofiler_samplesheet.csv --databases ./databases.csv \
--outdir ../../results/20240911_TP_results \
--perform_shortread_qc \
--shortread_qc_minlength 50 \
--perform_shortread_complexityfilter \
--perform_shortread_hostremoval \
--hostremoval_reference ../../viral_mask_results/masked_host_db/combined_hosts_phix.fna.gz \
--shortread_hostremoval_index ../../viral_mask_results/masked_host_db \
--run_kraken2 \
--run_bracken \
--run_motus \
--motus_save_mgc_read_counts \
--run_profile_standardisation \
--save_analysis_ready_fastqs \
--max_cpus 32

and my sample sheet looks like this:

sample,run_accession,instrument_platform,fastq_1,fastq_2,fasta
DNA_H1H_10_A1,DNA_H1H_10,ILLUMINA,/cluster/projects/nn10070k/projects/phagedrive/pd_data_control/data/DNA_H1H_10_A1_1.fq.gz,/cluster/projects/nn10070k/projects/phagedrive/pd_data_control/data/DNA_H1H_10_A1_2.fq.gz,
DNA_H1H_10_B1,DNA_H1H_10,ILLUMINA,/cluster/projects/nn10070k/projects/phagedrive/pd_data_control/data/DNA_H1H_10_B1_1.fq.gz,/cluster/projects/nn10070k/projects/phagedrive/pd_data_control/data/DNA_H1H_10_B1_2.fq.gz,

Command used and terminal output

nextflow run nf-core/taxprofiler -r 1.1.8 -profile apptainer \
-c saga_taxprofiler.config -work-dir $USERWORK/taxprofiler -resume \
--input taxprofiler_samplesheet.csv --databases ./databases.csv \
--outdir ../../results/20240911_TP_results \
--perform_shortread_qc \
--shortread_qc_minlength 50 \
--perform_shortread_complexityfilter \
--perform_shortread_hostremoval \
--hostremoval_reference ../../viral_mask_results/masked_host_db/combined_hosts_phix.fna.gz \
--shortread_hostremoval_index ../../viral_mask_results/masked_host_db \
--run_kraken2 \
--run_bracken \
--run_motus \
--motus_save_mgc_read_counts \
--run_profile_standardisation \
--save_analysis_ready_fastqs \
--max_cpus 32

Relevant files

No response

System information

No response

@Thomieh73 Thomieh73 added the bug Something isn't working label Sep 12, 2024
@sofstam
Copy link
Collaborator

sofstam commented Sep 12, 2024

Hello! The name is coming from the configuration of the modules. We combine the sample and run_accession.

@Thomieh73
Copy link
Author

Ah. now I understand it. It should then not happen if I keep the run_accessions blank. I will check that. :-)

@sofstam
Copy link
Collaborator

sofstam commented Sep 12, 2024

The column sample and run_accession are sample identifiers (sample name and run id) and I do not think you can keep none of them empty.

@sofstam
Copy link
Collaborator

sofstam commented Sep 13, 2024

Will close this for now but feel free to reopen it if you have any questions :)

@sofstam sofstam closed this as completed Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants