From 4dfba1f66e7e6b8d483d7bb85d10d6a1b79d8239 Mon Sep 17 00:00:00 2001 From: Chris Fields Date: Mon, 4 Dec 2023 13:00:25 -0600 Subject: [PATCH] update and push --- _episodes/05-processes-part1.md | 78 +++++++++++++++++---------------- 1 file changed, 40 insertions(+), 38 deletions(-) diff --git a/_episodes/05-processes-part1.md b/_episodes/05-processes-part1.md index 9e10c830..75bc8ee9 100644 --- a/_episodes/05-processes-part1.md +++ b/_episodes/05-processes-part1.md @@ -78,7 +78,7 @@ workflow { We can now run the process: ~~~ -$ nextflow run process_index.nf +$ nextflow run process/process_index.nf ~~~ {: .language-bash } @@ -121,11 +121,11 @@ What happened? Well, in our case we can't see the `salmon` executable; the error How do we address this? There are a number of options, but the general 'best practice' way would be to use a __directive__ which loads the software into the environment when the script is run. For example, this could be done by using the `module` directive and the appropriate [environment module](https://modules.sourceforge.net). Below is a modified version of the script in `./process/process_index.nf` which includes such a directive: ~~~ - +// modified process/process_index.nf nextflow.enable.dsl=2 process INDEX { - module "Salmon/1.10.0-IGB-gcc-8.2.0" + module "Salmon/1.10.0-IGB-gcc-8.2.0" // this is important! script: "salmon index -t ${projectDir}/data/yeast/transcriptome/Saccharomyces_cerevisiae.R64-1-1.cdna.all.fa.gz -i data/yeast/salmon_index --kmerLen 31" @@ -138,6 +138,8 @@ workflow { ~~~ {: .language-groovy } +Other scripts in this lesson may need similar modifications. + > ## A Simple Process > > Create a Nextflow script `simple_process.nf` that has one process `SALMON_VERSION` that runs the command. @@ -240,7 +242,7 @@ Or, for commands that span multiple lines you can encase the command in triple For example: ~~~ -//process_multi_line.nf +//process/process_multi_line.nf nextflow.enable.dsl=2 process PROCESSBAM { @@ -261,7 +263,7 @@ workflow { By default the process command is interpreted as a **Bash** script. However any other scripting language can be used just simply starting the script with the corresponding [Shebang](https://en.wikipedia.org/wiki/Shebang_(Unix)) declaration. For example: ~~~ -//process_python.nf +//process/process_python.nf nextflow.enable.dsl=2 process PYSTUFF { @@ -293,7 +295,7 @@ workflow { {: .language-groovy } ~~~ -//process_rscript.nf +//process/process_rscript.nf nextflow.enable.dsl=2 process RSTUFF { @@ -351,7 +353,7 @@ The variable is referenced using the `$kmer` syntax within the multi-line string A Nextflow variable can be used multiple times in the script block. ~~~ -//process_script.nf +//process/process_script.nf nextflow.enable.dsl=2 kmer = 31 @@ -378,7 +380,7 @@ In most cases we do not want to hard code parameter values. We saw in the parame In the example below we define the variable `params.kmer` with a default value of 31 in the Nextflow script. ~~~ -//process_script_params.nf +//process/process_script_params.nf nextflow.enable.dsl=2 params.kmer = 31 @@ -404,7 +406,7 @@ workflow { Remember, we can change the default value of `kmer` to 11 by running the Nextflow script using the command below. **Note:** parameters to the workflow have two hyphens `--`. ~~~ -nextflow run process_script_params.nf --kmer 11 +nextflow run process/process_script_params.nf --kmer 11 ~~~ {: .language-bash } @@ -413,7 +415,7 @@ nextflow run process_script_params.nf --kmer 11 > > For the Nextflow script below. > ~~~ -> //process_script_params.nf +> //process/process_script_params.nf > nextflow.enable.dsl=2 > params.kmer = 31 > @@ -435,14 +437,14 @@ nextflow run process_script_params.nf --kmer 11 > Run the pipeline using a kmer value of `27` using the `--kmer` command line option. > > ~~~ -> $ nextflow run process_script_params.nf --kmer -process.echo +> $ nextflow run process/process_script_params.nf --kmer -process.echo > ~~~ > {: .language-bash} > **Note:** The Nextflow option `-process.echo` will print the process' stdout to the terminal. > > > ## Solution > > ~~~ -> > nextflow run process_script_params.nf --kmer 27 -process.echo +> > nextflow run process/process_script_params.nf --kmer 27 -process.echo > > ~~~ > > {: .language-bash } > ~~~ @@ -469,7 +471,7 @@ In the example below we will set the bash variable `KMERSIZE` to the value of `$ ~~~ -//process_escape_bash.nf +//process/process_escape_bash.nf nextflow.enable.dsl=2 process INDEX { @@ -501,7 +503,7 @@ For example in the script below that uses the `shell` statement we reference the Nextflow variables as `!{projectDir}` and `!{params.kmer}`, and the Bash variable as `${KMERSIZE}`. ``` -//process_shell.nf +//process/process_shell.nf nextflow.enable.dsl=2 params.kmer = 31 @@ -546,10 +548,10 @@ else { {: .language-groovy } -For example, the Nextflow script below will use the `if` statement to change which index is created depending on the Nextflow variable `params.aligner`. +For example, the Nextflow script below will use the `if` statement to change which index is created depending on the Nextflow variable `params.aligner`. Note that if you are needing to load modules for these, you will likely need to load all of them in, or do so conditionally. ~~~ -//process_conditional.nf +//process/process_conditional.nf nextflow.enable.dsl=2 params.aligner = 'kallisto' @@ -584,7 +586,7 @@ workflow { {: .language-groovy } ~~~ -nextflow run process_conditional.nf -process.echo --aligner kallisto +nextflow run process/process_conditional.nf -process.echo --aligner kallisto ~~~ {: .language-bash } @@ -634,7 +636,7 @@ The input qualifier declares the type of data to be received. The `val` qualifier allows you to receive value data as input. It can be accessed in the process script by using the specified input name, as shown in the following example: ~~~ -//process_input_value.nf +//process/process_input_value.nf nextflow.enable.dsl=2 process PRINTCHR { @@ -658,7 +660,7 @@ workflow { {: .language-groovy } ~~~ -$ nextflow run process_input_value.nf -process.echo +$ nextflow run process/process_input_value.nf -process.echo ~~~ {: .language-bash } @@ -689,7 +691,7 @@ The input file name can be defined dynamically by defining the input name as a N For example in the script below we assign the variable name `read` to the input files using the `path` qualifier. The file is referenced using the variable substitution syntax `${read}` in the script block: ~~~ -//process_input_file.nf +//process/process_input_file.nf nextflow.enable.dsl=2 process NUMLINES { @@ -713,7 +715,7 @@ workflow { {: .language-groovy } ~~~ -$ nextflow run process_input_file.nf -process.echo +$ nextflow run process/process_input_file.nf -process.echo ~~~ {: .language-bash } @@ -737,7 +739,7 @@ The input name can also be defined as user specified filename inside quotes. For example in the script below the name of the file is specified as `'sample.fq.gz'` in the input definition and can be referenced by that name in the script block. ~~~ -//process_input_file_02.nf +//process/process_input_file_02.nf nextflow.enable.dsl=2 process NUMLINES { @@ -761,7 +763,7 @@ workflow { {: .language-groovy } ~~~ -$ nextflow run process_input_file_02.nf -process.echo +$ nextflow run process/process_input_file_02.nf -process.echo ~~~ {: .language-bash } @@ -791,7 +793,7 @@ sample.fq.gz 58708 > Add an input channel to the script below that takes the reads channel as input. > [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) is a quality control tool for high throughput sequence data. > ~~~ -> //process_exercise_input.nf +> //process/process_exercise_input.nf > nextflow.enable.dsl=2 > > process FASTQC { @@ -813,12 +815,12 @@ sample.fq.gz 58708 > {: .language-groovy } > Then run your script using > ~~~ -> nextflow run process_exercise_input.nf -process.echo +> nextflow run process/process_exercise_input.nf -process.echo > ~~~ > {: .language-bash } > > ## Solution > > ~~~ -> > //process_exercise_input_answer.nf +> > //process/process_exercise_input_answer.nf > > nextflow.enable.dsl=2 > > process FASTQC { > > input: @@ -863,7 +865,7 @@ However it’s important to understand how the number of items within the multip Consider the following example: ~~~ -//process_combine.nf +//process/process_combine.nf nextflow.enable.dsl=2 process COMBINE { @@ -887,7 +889,7 @@ workflow { {: .language-groovy } ~~~ -$ nextflow run process_combine.nf -process.echo +$ nextflow run process/process_combine.nf -process.echo ~~~ {: .language-bash } @@ -911,7 +913,7 @@ What happens when not all channels have the same number of elements? For example: ~~~ -//process_combine_02.nf +//process/process_combine_02.nf nextflow.enable.dsl=2 process COMBINE { @@ -935,7 +937,7 @@ workflow { {: .language-groovy } ~~~ -$ nextflow run process_combine_02.nf -process.echo +$ nextflow run process/process_combine_02.nf -process.echo ~~~ {: .language-bash } @@ -955,7 +957,7 @@ In the above example the process is executed only two times, because when a queu To better understand this behaviour compare the previous example with the following one: ~~~ -//process_combine_03.nf +//process/process_combine_03.nf nextflow.enable.dsl=2 process COMBINE { @@ -978,7 +980,7 @@ workflow { {: .language-groovy } ~~~ -$ nextflow run process_combine_03.nf -process.echo +$ nextflow run process/process_combine_03.nf -process.echo ~~~ {: .language-bash } @@ -1010,7 +1012,7 @@ And include the command below in the script directive > {: .language-groovy } > > ## Solution > > ~~~ -> > // process_exercise_combine_answer.nf +> > // process/process_exercise_combine_answer.nf > > nextflow.enable.dsl=2 > > process COMBINE { > > input: @@ -1041,7 +1043,7 @@ We saw previously that by default the number of times a process runs is defined For example if we can fix the previous example by using the input qualifer `each` for the letters queue channel: ~~~ -//process_repeat.nf +//process/process_repeat.nf nextflow.enable.dsl=2 process COMBINE { @@ -1065,7 +1067,7 @@ workflow { {: .language-groovy } ~~~ -$ nextflow run process_repeat.nf -process.echo +$ nextflow run process/process_repeat.nf -process.echo ~~~ {: .language-bash } @@ -1086,7 +1088,7 @@ The process will run eight times. > ## Input repeaters > Extend the script `process_exercise_repeat.nf` by adding more values to the `kmer` queue channel e.g. (21, 27, 31) and running the process for each value. > ~~~ -> //process_exercise_repeat.nf +> //process/process_exercise_repeat.nf > nextflow.enable.dsl=2 > process COMBINE { > input: @@ -1112,7 +1114,7 @@ The process will run eight times. > > > ## Solution > > ~~~ -> > //process_exercise_repeat_answer.nf +> > //process/process_exercise_repeat_answer.nf > > nextflow.enable.dsl=2 > > > > process COMBINE { @@ -1135,7 +1137,7 @@ The process will run eight times. > > ~~~ > > {: .language-groovy } > > ~~~ -> > nextflow run process_exercise_repeat.nf -process.echo +> > nextflow run process/process_exercise_repeat.nf -process.echo > > ~~~ > > {: .language-bash } > > This process runs three times.