Skip to content

FOCUS: Fine mapping TWAS associations in a single ancestry group

Zeyun edited this page Feb 8, 2022 · 4 revisions

The main aim of FOCUS is to fine-map TWAS associations at GWAS risk regions for single ancestry. FOCUS takes as input

  1. GWAS summary statistics
  2. reference LD
  3. eQTL weight database.

Given these data, FOCUS can fine-map in a tissue-agnostic or tissue-prioritized approach.

The basic command for fine-mapping is

focus finemap SUMSTATS PLINK_REFLD WEIGHT_DB --locations RISK_REGION

where SUMSTATS is the GWAS summary file, PLINK_REFLD is the path to PLINK-formatted genotype data for computing reference LD, and WEIGHT_DB is the path to a FOCUS weight database. RISK_REGION is the path to independent genomic regions (we have generated some files for your use. see wiki Home). Help on all the options and functionality can be listed by entering

focus finemap --help

For example, the command to perform tissue-agnostic fine-mapping on chromosome 1 for GWAS summary data LDL_2010.clean.sumstats.gz using 1000G.EUR.QC.1 reference genotypes, and gtex_v7.db eQTL weights for risk regions 37:EUR generated by LDetect on GRCh37 for European ancestry is given as,

focus finemap LDL_2010.clean.sumstats.gz 1000G.EUR.QC.1 gtex_v7.db --locations 37:EUR --chr 1 --out LDL_2010.chr1

To take the tissue-prioritized approach the flag --tissue TISSUE is added

focus finemap LDL_2010.clean.sumstats.gz 1000G.EUR.QC.1 gtex_v7.db --locations 37:EUR --chr 1 --tissue LIVER --out LDL_2010.chr1

FOCUS has the ability to generate a figure for each region that contains the predicted expression correlation, TWAS summary statistics and PIP for each gene. To do this add the --plot flag.

focus finemap LDL_2010.clean.sumstats.gz 1000G.EUR.QC.1 gtex_v7.db --locations 37:EUR  --chr 1 --tissue LIVER --plot --out LDL_2010.chr1

Here is an example image illustrating the local correlation structure, TWAS p-values, and PIPs for each model

The output from the finemap operation is a table:

Column Description
block independent genomic region chrom:start-chrom:stop
ens_gene_id Ensembl gene ID
ens_tx_id Ensemble transcript ID
mol_name Name of the gene/linc/pseudogene
tissue Tissue the original expression was measured in
ref_name Name of the QTL reference panel
type Type of molecular feature (gene, lncRNA, lincRNA, pseudogene)
chrom Chromosome
tx_start Transcription start site
tx_stop Transcription stop site
block_genes number of genes in the region to set the prior probability for a gene to be causal
inference_pop1 Inference procedure for model (e.g., LASSO, BSLMM)
inter_z_pop1 intercept of z scores when regressing out average tagged pleiotropic associations, None if intercept = False
cv.R2_pop1 Cross-validation predictive Rsquared
cv.R2.pval_pop1 P-value of the Cross-validation
twas_z_pop1 Marginal TWAS Z score
pip_pop1 Marginal posterior inclusion probability
in_cred_set_pop1 Flag indicating whether or not model is included in the credible set
ldregion_pop1 LD regions from reference genome

We recommend using reference LD from LDSC.

We recommend using a multiple tissue, multiple eQTL reference panel weight database here. This combines GTExv7 weights from PrediXcan with METSIM, NTR, YFS, and CMC weights from FUSION software into a single usable database for FOCUS.