Script for sorting sequences by clade

Purpose

Given a tree topology (newick format), a set of reference sequence names defining clades of interest in the tree topology, and a FASTA file containing all sequences represented in the tree, this simple Python script generates an individual FASTA file for sequences contained in each of the clades. This involves use of the Environment for Tree Exploration (ETE3) to find the most inclusive clade containing each reference sequence, but no other reference sequences. The names of sequences in each clade are then copied from the complete FASTA file to a clade-specific FASTA file in a given output directory path.

Setup

Activate a conda environment from the environment definition file (sort_seqs_by_clade_conda_env.yaml).

Usage

python3 sort_seqs_by_clade.py <topology file in newick format> \
                              <file listing reference sequence names> \
                              <fasta file with all sequences> \
                              <path to output directory to be generated>

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE.txt		LICENSE.txt
README.md		README.md
sort_seqs_by_clade.py		sort_seqs_by_clade.py
sort_seqs_by_clade_conda_env.yaml		sort_seqs_by_clade_conda_env.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Script for sorting sequences by clade

Purpose

Setup

Usage

About

Releases

Packages

Languages

License

laelbarlow/sort_seqs_by_clade

Folders and files

Latest commit

History

Repository files navigation

Script for sorting sequences by clade

Purpose

Setup

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages