Skip to content

Creating a weight database

Zeyun edited this page Feb 4, 2022 · 1 revision

FOCUS aims to fine-map across all observed associations at a risk region. Ideally this will be done using many prediction models trained across a variety of tissues and assays. In order to perform efficient inference in this setting FOCUS uses a custom database to query all relevant weights at a given risk region.

We offer a pre-built database composed of a multiple tissue, multiple eQTL reference panel weights here. This database combines GTExv7 weights from PrediXcan with METSIM, NTR, YFS, and CMC weights from FUSION software into a single usable database for FOCUS.

Alternatively, there are two ways to create a custom QTL-weight database for FOCUS:

  1. Importing from PrediXcan or FUSION.
  2. Training on individual-level data from reference panels

We illustrate how to perform either import below.

Importing from PrediXcan

Importing weights into a single FOCUS database using multiple PrediXcan databases is straightforward. The syntax to import is:

focus import PREDIXCAN_DB_FILE predixcan --tissue TISSUE_TYPE --name GTEx --assay rnaseq --output DB_NAME

Using this command focus will import weights from the PREDIXCAN_DB_FILE sqlite database file, mark that the weights correspond to TISSUE_TYPE, the name of the reference panel is GTEx and the original assay was rnaseq. This will create a FOCUS-specific sqlite database namedDB_NAME.db. By default if the --output DB_NAME setting matches an existing database, then FOCUS will automatically append models, rather than overwrite. This can be useful to chain together multiple imports to create a single database.

The following script will compile all GTEx-v7 weights into a single database named gtex_v7.db (nb: takes ~ 4 hours to run as it is mostly I/O bound):

#!/bin/bash

tissues=(Adipose_Subcutaneous Adipose_Visceral_Omentum Adrenal_Gland Artery_Aorta Artery_Coronary Artery_Tibial Brain_Amygdala Brain_Anterior_cingulate_cortex_BA24 Brain_Caudate_basal_ganglia Brain_Cerebellar_Hemisphere Brain_Cerebellum Brain_Cortex Brain_Frontal_Cortex_BA9 Brain_Hippocampus Brain_Hypothalamus Brain_Nucleus_accumbens_basal_ganglia Brain_Putamen_basal_ganglia Brain_Spinal_cord_cervical_c-1 Brain_Substantia_nigra Breast_Mammary_Tissue Cells_EBV-transformed_lymphocytes Cells_Transformed_fibroblasts Colon_Sigmoid Colon_Transverse Esophagus_Gastroesophageal_Junction Esophagus_Mucosa Esophagus_Muscularis Heart_Atrial_Appendage Heart_Left_Ventricle Liver Lung Minor_Salivary_Gland Muscle_Skeletal Nerve_Tibial Ovary Pancreas Pituitary Prostate Skin_Not_Sun_Exposed_Suprapubic Skin_Sun_Exposed_Lower_leg Small_Intestine_Terminal_Ileum Spleen Stomach Testis Thyroid Uterus Vagina Whole_Blood) 


n=${#tissues[@]}

for idx in `seq 0 $((n - 1))`
do
    tissue=${tissues[$idx]}

    focus import gtex_v7_${tissue}_imputed_europeans_tw_0.5_signif.db predixcan --tissue ${tissue} --name GTEx --assay rnaseq --output gtex_v7
done

Importing from FUSION

Importing weights into a single FOCUS database using multiple FUSION databases is equally straightforward. The syntax to import is:

focus import FUSION_DB_FILE fusion --tissue TISSUE_TYPE --name NAME --assay ASSAY --output DB_NAME

Using this command focus will import weights from the FUSION_DB_FILE weight-list file, mark that the weights correspond to TISSUE_TYPE, the name of the reference panel is NAME and the original assay was ASSAY. This will create a FOCUS-specific sqlite database namedDB_NAME.db. Here the FUSION_DB_FILE should be a path to a .pos FUSION weight-list and be in the same parent directory as the individual weights. Same as for predixcan, if the --output DB_NAME setting matches an existing database, then FOCUS will automatically append models, rather than overwrite.

An example importing from CMC, METSIM, NTR, and YFS from FUSION is given below. Here ~/FUSION_WEIGHTS/ is a directory that contains the directories and pos files for each of the weight sets.

#!/bin/bash

focus import ~/FUSION_WEIGHTS/CMC.BRAIN.RNASEQ.pos fusion --tissue brain_dorsolateral_prefrontal_cortex --name CMC --assay rnaseq --output fusion
focus import ~/FUSION_WEIGHTS/METSIM.ADIPOSE.RNASEQ.pos fusion --tissue adipose --name METSIM --assay rnaseq --output fusion
focus import ~/FUSION_WEIGHTS/NTR.BLOOD.RNAARR.pos fusion --tissue blood --name NTR --assay array --output fusion
focus import ~/FUSION_WEIGHTS/YFS.BLOOD.RNAARR.pos fusion --tissue blood --name YFS --assay array --output fusion

Training on individual-level data

TBD