Identifying changes in resistome relative to closest relatives and contextualising results
To run/test create a conda environment (or virtualenv) and run:
pip install -r requirements.txt --editable .
Then you should be able to run:
python etd.py
Make sure you add any new requirements you've added to the requirements.txt
file
If you write then tests can be run as follows:
pytest
Alternatively to run the full set of tests including different python versions, pyflake code style, and building the documentation run:
tox
The main output directory is either specified by the user or defaults to the input genome name followed by the UNIX timestamp
test_genome_1576349112 ├── mash │ └── mash_distances.tsv ├── rgi │ ├── test_genome.json │ └── test_genome.txt ├── related_isolate_geospatial_analysis_of_relatives └── unique_to_isolate ├── amr_gene1 │ ├── phylogenetic │ └── genomic_context └── amr_gene2 ├── phylogenetic └── genomic_context
- MASH
- PPLACER
- HHMER
- e-utils
The database can be built from CARD prevalence sequences i.e. a directory containing a set of directories for each taxa you want to include.
card_prevalence ├── genomes # directory containing all CARD-prevalence genomes | |── Klebsiella_pneumoniae_NCBI_May2020 | | ├── NZ_NGWN01_wgs.fa | | └── NZ_NIDM01000036.1_plasmid.fa │ └── Klebsiella_oxytoca_NCBI_May2020 | ├── NZ_CP029128.1_chromosome.fa | └── NZ_CP033845.1_plasmid.fa ├── rgi_results # directory containing all RGI outputs with the same name as the genomes | |── Klebsiella_pneumoniae_NCBI_May2020 | | ├── NZ_NGWN01_wgs.txt | | └── NZ_NIDM01000036.1_plasmid.txt │ └── Klebsiella_oxytoca_NCBI_May2020 | ├── NZ_CP029128.1_chromosome.txt | └── NZ_CP033845.1_plasmid.txt ├── card # directory containing the version of CARD canonical and prevalence used | |── card-data | └── card-prevalence └── etd_db # etd specific generated files |── etd_db_index.json # index for database |── card_prev.msh # mash sketch of all the genomes in CARD prevalence (i.e. genomes folder) |── genome_trees # directory containing all generated genome phylogenies | ├── genome_trees_index.json # index linking accessions to their specific tree | ├── Klebsiella_pneumoniae_NCBI_May2020.mashtree | └── Klebsiella_oxytoca_NCBI_May2020.mashtree └── amr_phylogenies # directory containing all clustered CARD+CARD-Prevalence phylogenies ├── amr_phylogenies_index.json # index linking AROs to clusters └── cluster1.tree └── cluster2.tree
Extract context using faidx (12.7kb average length of transposon): 2021.01.10.426126 Visualisation of neighbourhood with https://github.com/gamcil/clinker
Analysis of context: https://github.com/wtmatlock/flanker