This Nextflow workflow provides a simple way to analyse Oxford Nanopore reads generated from amplicons.
It requires the raw reads and a reference FASTA file containing one sequence per amplicon. After filtering (based on read length and quality) and trimming, reads are aligned to the reference using minimap2. Variants are then called with Medaka. Results include an interactive HTML report and VCF files containing the called variants.
As mentioned above, the reference FASTA file needs to contain one sequence per amplicon for now. An option to provide a whole-genome reference file and pairs of primers might be added in the future if requested by users.
The workflow relies on the following dependencies:
- Nextflow for managing compute and software resources.
- Either Docker or Singularity to provide isolation of the required software.
It is not necessary to clone or download the git repository in order to run the workflow. For more information on running EPI2ME Labs workflows, visit our website.
Workflow options
If you have Nextflow installed, you can run the following to obtain the workflow:
nextflow run epi2me-labs/wf-amplicon --help
This will show the workflow's command line options.
Basic usage is as follows
nextflow run epi2me-labs/wf-amplicon \
--fastq $input \
--reference references.fa \
--threads 4
$input
can be a single FASTQ file, a directory containing FASTQ files, or a
directory containing barcoded sub-directories which in turn contain FASTQ files.
A sample sheet can be included with --sample_sheet
and a sample name for an
individual sample with --sample
.
Relevant options for filtering of raw reads are
--min_read_length
--max_read_length
--min_read_qual
After filtering and trimming with
Porechop, reads can optionally be
downsampled. You can control the number of reads to keep per sample with
--reads_downsampling_size
.
Haploid variants are then called with
Medaka. You can set the minimum
coverage a variant needs to exceed in order to be included in the results with
--min_coverage
. The workflow selects the appropriate
Medaka model based on the basecaller
configuration that was used to process the signal data. You can use the
parameter --basecaller_cfg
to provide this information (e.g.
dna_r10.4.1_e8.2_400bps_hac
) to the workflow. Alternatively, you can choose
the Medaka model directly with
--medaka_model
.
If you want to use
Singularity instead of
Docker, add -profile singularity
.
- Interactive HTML report detailing the results.
- VCF files with variants called by Medaka.