To run

Version 1.0: 12.22.17

This suite of modules is meant to ease the babysitting load for routinely processing genotype files to ADMIXTURE or imputation.

Requires Python 3.6.3, but will identify Python version and update if needed.
Starts from plink bed/bim/fam files. I'll eventually add other file formats.

To run

Always start using python genoprocess.py then you will see a screen asking you what you'd like to do next.

Modules

genoprocess

Sees what version of Python you're running.
Downloads Python 3.6.3 if you're on linux and exits telling you to download it if on Mac or Windows.

Asks the user if they would like to do the following:

genoprocess task	Short description	System requirements	Notes
1	Download reference files and programs	Most any, some Mac or Linux	Some programs will not download if you aren't on Linux because they are Linux based
2	Update sex	Any
3	Produce new dataset of people and SNPs with missince all rate < 10%	Any
4	Run IBD matrix through Plink	Any
5	Update family (FID) and individual IDs (IID)	Any
6	Update maternal and paternal IDs	Any
7	Harmonize with 1000G Phase 3	Any
8	Filter for extreme (+-3SD) heterozygosity values	Any
9	Merge with 1000G Phase 3	Any
10	Prepare for ADMIXTURE with k = 3...9	Mac or Linux	I will submit the job for you if you're on the Penn State ACI-B cluster
11	Run a phasing check and prepare files for phasing using SHAPEIT 2	Mac or Linux	I will submit the job for you if you're on the Penn State ACI-B cluster
12	Prepares files for imputation using the Sanger Imputation Server	Mac or Linux
13	Nothing

genodownload

Downloads the following:

Python 3.6.3
Plink 1.9
1000G Phase 3 VCF
1000G Phase 3 Hap/Legend/Sample
GRCh37/hg19 1000G FASTA file
Genotype Harmonizer
pip
snpflip
shapeit
vcftools
bcftools
htslib
samtools

genoqc

Update sex
Missing call rate
Heterozygosity check

genorelatives

Run IBD to identify relatives
Update FID & IID information
Update parental IDs

genoharmonize

Harmonize with 1000G

genomerge

Merge with 1000G

genoadmixture

Prepare for admixture, submit job if on Penn Sate ACI-B cluster

Should be done before admixture	genoprocess number	Module used
Keep only SNPs and people with missing call rate < 10%	#3	`genoqc.missing_call_rate()`
Run IBD to identify relatives	#4	`genorelatives.ibd()`
Update FID and/or IID information	#5	`genorelatives.update_id()`
Update parental IDs	#6	`genorelatives.update_parental()`
Harmonize with 1000G Phase3	#7	`genoharmonize.harmonize_with_1oooG()`
Filter out individuals with extreme heterozygosity values	#8	`genoqc.het()`
Merge with 1000G	#9	`genomerge.merge1000g()`

genophaseimpute

Checks data and prepares for phasing using shapeit.
Checks phased data for imputation

Should be done before phasing	genoprocess number	Module used
Genotypes must be on GRCh37/hg19	User should check this	None
Keep only SNPs and people with missing call rate < 10%	#3	`genoqc.missing_call_rate()`
Filter out individuals with extreme heterozygosity values	#8	`genoqc.het()`
Filter out SNPs with MAF < 5%	#7	`genoharmonize.harmonize_with_1000G()`
Filter out SNPs with HWE p-value < 0.05	#7	`genoharmonize.harmonize_with_1000G()`
Set haploid genotypes (male ChrX) as missing	User should check this if any warnings of het haploid genotypes appear	None
Check for gender mismatches	#2	`genoqc.update_sex()`
Harmonize with 1000G Phase3	#7	`genoharmonize.harmonize_with_1000G()`

Name		Name	Last commit message	Last commit date
Latest commit History 207 Commits
README.md		README.md
genoadmixture.py		genoadmixture.py
genodownload.py		genodownload.py
genoharmonize.py		genoharmonize.py
genomerge.py		genomerge.py
genophaseimpute.py		genophaseimpute.py
genoprocess.py		genoprocess.py
genoqc.py		genoqc.py
genorelatives.py		genorelatives.py
getpython.py		getpython.py
harmonize_postprocess.py		harmonize_postprocess.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

To run

Modules

genoprocess

genodownload

genoqc

genorelatives

genoharmonize

genomerge

genoadmixture

genophaseimpute

About

Releases

Packages

Languages

juliedwhite/GenotypeProcessing

Folders and files

Latest commit

History

Repository files navigation

To run

Modules

genoprocess

genodownload

genoqc

genorelatives

genoharmonize

genomerge

genoadmixture

genophaseimpute

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages