Insect genome assembly stats

scripts used to download and analyze insect assemblies from GenBank These scripts were used to download and organize insect genome assembly metadata from NCBI.

download_insect_data.sh is a simple shell scrip that uses the ncbi datasets (v. 10.9.0) command line tool to download individual json files for each insect order.

convert2csv.py uses pandas to convert the json file to a csv.

extract_genome_stats.py extracts some relevant statistics from the csv files and organizes them into a new output file.

scrape_assembly_info.py is a webscraping script that uses beautiful soup to find metadata that isn't included in the datasets tool (sequencing coverage, sequencing technology, assembler used).

You can run scrape_assembly_info.py over a list of accession numbers in a text file with, e.g. for i in `cat accessions.txt`; do python scrape_assembly_info.py $i; done The resulting data will be written to a file called assembly_type.csv.

Figure 2 a,b scripts

Fig2a_data.txt

A modified of the big data frame I used for the plot in Fig. 2a. This file is read by the Fig_2a_b_plotter.R script.

Fig2b_parse.py

A python script that parses the above file and ouputs a subset of the data formatted for plotting the BUSCO plot in 2b.

Fig2b_data.txt

The file produced by running the above python script. This file is read by the Fig_2a_b_plotter.R script to produce plot 2b (BUSCOs).

Fig_2a_b_plotter.R

The R script that plots 2a using Fig2a_data.txt as input, and 2b using Fig2b_data.txt as input.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Insect genome assembly stats

Figure 2 a,b scripts

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Fig2a_data.txt		Fig2a_data.txt
Fig2b_data.txt		Fig2b_data.txt
Fig2b_parse.py		Fig2b_parse.py
Fig_2a_b_plotter.R		Fig_2a_b_plotter.R
README.md		README.md
Table S1. GenBank Insect Genomes.xlsx		Table S1. GenBank Insect Genomes.xlsx
convert2csv.py		convert2csv.py
download_insect_data.sh		download_insect_data.sh
extract_genome_stats.py		extract_genome_stats.py
insect_orders.txt		insect_orders.txt
scrape_assembly_info.py		scrape_assembly_info.py

deyuanyang/insect_genome_assemblies

Folders and files

Latest commit

History

Repository files navigation

Insect genome assembly stats

Figure 2 a,b scripts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages