Dynamic output using file input #2544
-
I saw in the documentation that we can use inputs to name outputs, such as: process align {
input:
val x from species
file seq from sequences
output:
file "${x}.aln" into genomes
...
} However, when I tried this with a file input, it resulted in an error. In this case, I was trying to make the alignment file (BAM) have the same name as the input file (FASTQ), just replacing fastq = Channel.fromPath(fastqFiles).buffer(size:2)
process bwa_align {
input:
file test_fastq from fastq
output:
file "${test_fastq[0].simpleName}.bam" into aligned_bam
script:
... Despite seeming very similar to the example in the documentation, this resulted in an error:
Does the dynamic output feature work for file inputs? if not, it might be good to update the documentation to explain that more clearly. If it is supposed to work, I'm curious what is wrong with the code above? NB: I understand that this isn't considered the best way to write Nextflow pipelines. However our system requires a specific naming scheme for our output files, so I need to ensure certain files are named correctly. I can see how this can perhaps be achieved with workarounds (eg: passing metadata through with tuples). But it would be great to know if a simpler approach can work. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments
-
Hi, Dynamic output file names do work with input files. Here's an example similar to yours (I just added a process that generates dummy fastqs):
You can try it yourself and see that the output bams are generated and printed as expected. |
Beta Was this translation helpful? Give feedback.
-
thanks! I will try this out. |
Beta Was this translation helpful? Give feedback.
-
Hi Simon! to extend Manuele comment, I think what you are observing is that when the |
Beta Was this translation helpful? Give feedback.
-
Happy to help :) |
Beta Was this translation helpful? Give feedback.
-
hey @pditommaso @manuelesimi - have to apologise as after I investigated more I found my example above wasn't quite accurate as to the failure scenario. In fact what was confusing me is that there is different behaviour when the output path name is constructed within a path declaration vs outside. In other words these two, which I thought would be the same, behave differently: Works: output:
file "${test_fastq[0].simpleName}.bam" into aligned_bam Fails: output:
def outputPath = "${test_fastq[0].simpleName}.bam"
file outputPath into aligned_bam I can only assume there is some interesting Groovy AST rearrangement going on that allows the first closure to resolve output:
file "${test_fastq[0].simpleName}.bam".toString() into aligned_bam // fails and GString outputPath
output:
file (outputPath = "${map_file_name(test_fastq)}") into aligned_bam // fails The reason I'm exploring this is that I really want to avoid repeating the logic for how the input path maps to the output path in multiple places (I want to keep it "DRY", so to speak). So it works OK if I'm happy to repeat the expression for the output file name in multiple places. But the best I have found to make it "DRY" is to put a closure inside the GString: Closure map_file_name = { inp ->
return inp[0].simpleName + '.bam'
}
input:
file test_fastq from fastq
output:
file "${map_file_name(test_fastq)}" into aligned_bam
script:
"""
... some command ... > ${map_file_name(test_fastq)}
""" It would be nice if there was a more straight forward way to do it, but this works well enough I think! Thanks for humoring me as I try to learn Nextflow!!! |
Beta Was this translation helpful? Give feedback.
-
I think your closure is a fair enough solution. However, unless you have some external requirements not shown here, you can simply assign the same name to the output file as follows:
This way you don't have to do the mapping at all. This works also if you publish all these bams in the same folder, because you can always use the But I wish all the issues I have with my pipelines would be like this :) |
Beta Was this translation helpful? Give feedback.
-
thanks @manuelesimi yep, I understand that's the more idiomatic solution with Nextflow, and indeed, my situation is that I have some external requirements that I'm trying to satisfy about the naming of the files. I'll close this as I have a good understanding of it now - thanks again for the help! |
Beta Was this translation helpful? Give feedback.
I think your closure is a fair enough solution.
However, unless you have some external requirements not shown here, you can simply assign the same name to the output file as follows:
This way you don't have to do the mapping at all.
Nextflow would take care of passing it to the next process without any conflict.
This works also if you publish all these bams in the same folder, because you can always use the
saveAs
option ofpublishDir
to rename them. It's really up to you.But I wish…