-
Notifications
You must be signed in to change notification settings - Fork 5
Home
Here you'll find details on how to import weights, train weights using your own data, clean summary GWAS, and perform fine-mapping on TWAS results in single-ancestry or multi-ancestry settings.
Please see the sidebar for links to each.
We recommend using reference LD from LDSC. From the command line this can be done as
wget https://data.broadinstitute.org/alkesgroup/LDSCORE/1000G_Phase3_plinkfiles.tgz
tar -xvzf 1000G_Phase3_plinkfiles.tgz
We recommend using a multiple tissue, multiple eQTL reference panel weight database here. This combines GTExv7 weights from PrediXcan with METSIM, NTR, YFS, and CMC weights from FUSION software into a single usable database for FOCUS.
wget https://www.dropbox.com/s/ep3dzlqnp7p8e5j/focus.db?dl=0
mv focus.db?dl=0 focus.db
We recommend using independent genomic regions generated by LDetect proposed in
Berisa, T., and Pickrell, J.K. (2016). Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285. DOI: 10.1093/bioinformatics/btv546
and independent genomic regions across multiple ancestries by modified LDetect proposed in
Shi, H., Burch, K.S., Johnson, R., Freund, M.K., Kichaev, G., Mancuso, N., Manuel, A.M., Dong, N., and Pasaniuc, B. (2020). Localizing Components of Shared Transethnic Genetic Architecture of Complex Traits from GWAS Summary Data. Am. J. Hum. Genet. 106, 805–817. DOI:10.1016/j.ajhg.2020.04.012
In the software, we provide several genome-wide independent regions:
-
grch37.eur.afr.loci.bed
: use--locations 37:EUR-AFR
for independent genomic regions across EUR and AFR ancestries.--locations 38:EUR-AFR
for the GRCh38 version. -
grch37.eur.eas.afr.loci.bed
: use--locations 37:EUR-EAS-AFR
for independent genomic regions across EUR, EAS, and AFR ancestries.--locations 38:EUR-EAS-AFR
for the GRCh38 version. -
grch37.eur.eas.loci.bed
: use--locations 37:EUR-EAS
for independent genomic regions across EUR and EAS ancestries.--locations 38:EUR-EAS
for the GRCh38 version. -
grch37.eur.loci.bed
: use--locations 37:EUR
for independent genomic regions across EUR.--locations 38:EUR
for the GRCh38 version.
You can use your own independent risk region files by directly specifying the path after --locations
. Please make sure that your bed files contain column names chrom
, start
, stop
for the chromosome name, regions start position, and region stop position. chrom
has to be integer such as 1, 2, 3
(not chr1, chr2, chr3, ...
).
We have default gencode files with both v37 and v38 available with the software. Please specify this with --prior-prob gencode38
or --prior-prob gencode37
.
You can use your own gencode files by directly specifying the path after --prior-prob
. Please make sure that your files are in tsv
format and contain column names chrom
, start
, stop
, and gene_name
for the chromosome name, start position, stop position, and gene name. chrom
has to be integer such as 1, 2, 3
(not chr1, chr2, chr3, ...
).
We recommend to make sure that your gencode files do not contain duplicated rows (gene names are unique).
Other than using gencode files, you can still specify the prior probability for a gene to be causal as a numeric number (e.g., 0.01, 0.005).