Skip to content

epi2me-labs/wf-amplicon

Repository files navigation

wf-amplicon

Introduction

This Nextflow workflow provides a simple way to analyse Oxford Nanopore reads generated from amplicons.

It requires the raw reads and a reference FASTA file containing one sequence per amplicon. After filtering (based on read length and quality) and trimming, reads are aligned to the reference using minimap2. Variants are then called with Medaka. Results include an interactive HTML report and VCF files containing the called variants.

As mentioned above, the reference FASTA file needs to contain one sequence per amplicon for now. An option to provide a whole-genome reference file and pairs of primers might be added in the future if requested by users.

Quickstart

The workflow relies on the following dependencies:

  • Nextflow for managing compute and software resources.
  • Either Docker or Singularity to provide isolation of the required software.

It is not necessary to clone or download the git repository in order to run the workflow. For more information on running EPI2ME Labs workflows, visit our website.

Workflow options

If you have Nextflow installed, you can run the following to obtain the workflow:

nextflow run epi2me-labs/wf-amplicon --help

This will show the workflow's command line options.

Usage

Basic usage is as follows

nextflow run epi2me-labs/wf-amplicon \
    --fastq $input \
    --reference references.fa \
    --threads 4

$input can be a single FASTQ file, a directory containing FASTQ files, or a directory containing barcoded sub-directories which in turn contain FASTQ files. A sample sheet can be included with --sample_sheet and a sample name for an individual sample with --sample.

Relevant options for filtering of raw reads are

  • --min_read_length
  • --max_read_length
  • --min_read_qual

After filtering and trimming with Porechop, reads can optionally be downsampled. You can control the number of reads to keep per sample with --reads_downsampling_size.

Haploid variants are then called with Medaka. You can set the minimum coverage a variant needs to exceed in order to be included in the results with --min_coverage. The workflow selects the appropriate Medaka model based on the basecaller configuration that was used to process the signal data. You can use the parameter --basecaller_cfg to provide this information (e.g. dna_r10.4.1_e8.2_400bps_hac) to the workflow. Alternatively, you can choose the Medaka model directly with --medaka_model.

If you want to use Singularity instead of Docker, add -profile singularity.

Key outputs

  • Interactive HTML report detailing the results.
  • VCF files with variants called by Medaka.

Useful links