-
Notifications
You must be signed in to change notification settings - Fork 5
Creating a weight database
FOCUS aims to fine-map across all observed associations at a risk region. Ideally this will be done using many prediction models trained across a variety of tissues and assays. In order to perform efficient inference in this setting FOCUS uses a custom database to query all relevant weights at a given risk region.
We offer a pre-built database composed of a multiple tissue, multiple eQTL reference panel weights here. This database combines GTExv7 weights from PrediXcan with METSIM, NTR, YFS, and CMC weights from FUSION software into a single usable database for FOCUS.
Alternatively, there are two ways to create a custom QTL-weight database for FOCUS:
- Importing from PrediXcan or FUSION.
- Training on individual-level data from reference panels
We illustrate how to perform either import below.
Importing weights into a single FOCUS database using multiple PrediXcan databases is straightforward. The syntax to import is:
focus import PREDIXCAN_DB_FILE predixcan --tissue TISSUE_TYPE --name GTEx --assay rnaseq --output DB_NAME
Using this command focus will import weights from the PREDIXCAN_DB_FILE
sqlite database file, mark that the weights correspond to TISSUE_TYPE
, the name of the reference panel is GTEx
and the original assay was rnaseq
. This will create a FOCUS-specific sqlite database namedDB_NAME.db
. By default if the --output DB_NAME
setting matches an existing database, then FOCUS will automatically append models, rather than overwrite. This can be useful to chain together multiple imports to create a single database.
The following script will compile all GTEx-v7 weights into a single database named gtex_v7.db
(nb: takes ~ 4 hours to run as it is mostly I/O bound):
#!/bin/bash
tissues=(Adipose_Subcutaneous Adipose_Visceral_Omentum Adrenal_Gland Artery_Aorta Artery_Coronary Artery_Tibial Brain_Amygdala Brain_Anterior_cingulate_cortex_BA24 Brain_Caudate_basal_ganglia Brain_Cerebellar_Hemisphere Brain_Cerebellum Brain_Cortex Brain_Frontal_Cortex_BA9 Brain_Hippocampus Brain_Hypothalamus Brain_Nucleus_accumbens_basal_ganglia Brain_Putamen_basal_ganglia Brain_Spinal_cord_cervical_c-1 Brain_Substantia_nigra Breast_Mammary_Tissue Cells_EBV-transformed_lymphocytes Cells_Transformed_fibroblasts Colon_Sigmoid Colon_Transverse Esophagus_Gastroesophageal_Junction Esophagus_Mucosa Esophagus_Muscularis Heart_Atrial_Appendage Heart_Left_Ventricle Liver Lung Minor_Salivary_Gland Muscle_Skeletal Nerve_Tibial Ovary Pancreas Pituitary Prostate Skin_Not_Sun_Exposed_Suprapubic Skin_Sun_Exposed_Lower_leg Small_Intestine_Terminal_Ileum Spleen Stomach Testis Thyroid Uterus Vagina Whole_Blood)
n=${#tissues[@]}
for idx in `seq 0 $((n - 1))`
do
tissue=${tissues[$idx]}
focus import gtex_v7_${tissue}_imputed_europeans_tw_0.5_signif.db predixcan --tissue ${tissue} --name GTEx --assay rnaseq --output gtex_v7
done
Importing weights into a single FOCUS database using multiple FUSION databases is equally straightforward. The syntax to import is:
focus import FUSION_DB_FILE fusion --tissue TISSUE_TYPE --name NAME --assay ASSAY --output DB_NAME
Using this command focus will import weights from the FUSION_DB_FILE
weight-list file, mark that the weights correspond to TISSUE_TYPE
, the name of the reference panel is NAME
and the original assay was ASSAY
. This will create a FOCUS-specific sqlite database namedDB_NAME.db
. Here the FUSION_DB_FILE
should be a path to a .pos
FUSION weight-list and be in the same parent directory as the individual weights. Same as for predixcan, if the --output DB_NAME
setting matches an existing database, then FOCUS will automatically append models, rather than overwrite.
An example importing from CMC, METSIM, NTR, and YFS from FUSION is given below. Here ~/FUSION_WEIGHTS/
is a directory that contains the directories and pos
files for each of the weight sets.
#!/bin/bash
focus import ~/FUSION_WEIGHTS/CMC.BRAIN.RNASEQ.pos fusion --tissue brain_dorsolateral_prefrontal_cortex --name CMC --assay rnaseq --output fusion
focus import ~/FUSION_WEIGHTS/METSIM.ADIPOSE.RNASEQ.pos fusion --tissue adipose --name METSIM --assay rnaseq --output fusion
focus import ~/FUSION_WEIGHTS/NTR.BLOOD.RNAARR.pos fusion --tissue blood --name NTR --assay array --output fusion
focus import ~/FUSION_WEIGHTS/YFS.BLOOD.RNAARR.pos fusion --tissue blood --name YFS --assay array --output fusion
TBD