This repository contains a template workflow for state of the art molecular phylogeny inference using tools like mafft
, trimAl
, and IQtree
.
The workflow is a Jupyter notebook; a hands on workflow aimed at the practical steps needed from start to finish.
Instructions guide the user through the code necessary to run these tools and several checks and balances along the way from one starting sequence to a phylogenetic tree.
Secondly, the workflow encourages users to document their choices and output; making science a bit more transparent and reproducible.
This document does not aim to guide the user in interpreting phylogenies, or go into detail on the main considerations when designing your evolutionary inference. However, luckily this paper does a very good job in doing so and is written by true experts in the field. Perhaps take a look at their short video abstract as well, explaining a use case.
To do this workflow, you need a linux environment like a linux computer, MacOS, or the 'windows sublayer for linux'.
Second, you need the conda
or miniconda
framework for installing bioinformatics software.
Install all required software as detailed in the conda environment included in this repository like so: conda env create -f conda_environment.yaml
.
Third, you need one sequence or sequence ID you are interested in.
This workflow aims to guide you through the following steps
- acquire homologous sequences to your sequence via either ncbi blast, or the 1kP project (only for plant sequences)
- subset your input to contain all sequences of a limited number of species
- align sequences with
mafft
- trim alignment with
trimAL
- Visualise alignment with Jalview
- Evaluate and optimise
- infer a phylogenetic tree with fasttree
- infer a phylogenetic tree with IQtree
- use modelfitting
- choose a bootstrap method
- visualise the phylogenetic tree with iTol
- annotate the phylogenetic tree in iTol
Please find published examples here:
- LAR phylogeny GitHub Zenodo
- MIKC phylogeny GitHub Zenodo
- R2R3 MYB phylogeny GitHub Zenodo
- 2-OGD Phylogeny GitHub
This workflow is currently under construction, but nonetheless citable via zenodo here: