-
Notifications
You must be signed in to change notification settings - Fork 5
FOCUS: Fine mapping TWAS associations in a single ancestry group
The main aim of FOCUS is to fine-map TWAS associations at GWAS risk regions for single ancestry. FOCUS takes as input
- GWAS summary statistics
- reference LD
- eQTL weight database.
Given these data, FOCUS can fine-map in a tissue-agnostic or tissue-prioritized approach.
The basic command for fine-mapping is
focus finemap SUMSTATS PLINK_REFLD WEIGHT_DB --locations RISK_REGION
where SUMSTATS
is the GWAS summary file, PLINK_REFLD
is the path to PLINK-formatted genotype data for computing reference LD, and WEIGHT_DB
is the path to a FOCUS weight database. RISK_REGION
is the path to independent genomic regions (we have generated some files for your use. see wiki Home). Help on all the options and functionality can be listed by entering
focus finemap --help
For example, the command to perform tissue-agnostic fine-mapping on chromosome 1 for GWAS summary data LDL_2010.clean.sumstats.gz
using 1000G.EUR.QC.1
reference genotypes, and gtex_v7.db
eQTL weights for risk regions 37:EUR
generated by LDetect on GRCh37 for European ancestry is given as,
focus finemap LDL_2010.clean.sumstats.gz 1000G.EUR.QC.1 gtex_v7.db --locations 37:EUR --chr 1 --out LDL_2010.chr1
To take the tissue-prioritized approach the flag --tissue TISSUE
is added
focus finemap LDL_2010.clean.sumstats.gz 1000G.EUR.QC.1 gtex_v7.db --locations 37:EUR --chr 1 --tissue LIVER --out LDL_2010.chr1
FOCUS has the ability to generate a figure for each region that contains the predicted expression correlation, TWAS summary statistics and PIP for each gene. To do this add the --plot
flag.
focus finemap LDL_2010.clean.sumstats.gz 1000G.EUR.QC.1 gtex_v7.db --locations 37:EUR --chr 1 --tissue LIVER --plot --out LDL_2010.chr1
Here is an example image illustrating the local correlation structure, TWAS p-values, and PIPs for each model
The output from the finemap operation is a table:
Column | Description |
---|---|
block | independent genomic region chrom:start-chrom:stop |
ens_gene_id | Ensembl gene ID |
ens_tx_id | Ensemble transcript ID |
mol_name | Name of the gene/linc/pseudogene |
tissue | Tissue the original expression was measured in |
ref_name | Name of the QTL reference panel |
type | Type of molecular feature (gene, lncRNA, lincRNA, pseudogene) |
chrom | Chromosome |
tx_start | Transcription start site |
tx_stop | Transcription stop site |
block_genes | number of genes in the region to set the prior probability for a gene to be causal |
inference_pop1 | Inference procedure for model (e.g., LASSO, BSLMM) |
inter_z_pop1 | intercept of z scores when regressing out average tagged pleiotropic associations, None if intercept = False |
cv.R2_pop1 | Cross-validation predictive Rsquared |
cv.R2.pval_pop1 | P-value of the Cross-validation |
twas_z_pop1 | Marginal TWAS Z score |
pip_pop1 | Marginal posterior inclusion probability |
in_cred_set_pop1 | Flag indicating whether or not model is included in the credible set |
ldregion_pop1 | LD regions from reference genome |
We recommend using reference LD from LDSC.
We recommend using a multiple tissue, multiple eQTL reference panel weight database here. This combines GTExv7 weights from PrediXcan with METSIM, NTR, YFS, and CMC weights from FUSION software into a single usable database for FOCUS.