Skip to content

Latest commit

 

History

History
166 lines (125 loc) · 6.73 KB

README.md

File metadata and controls

166 lines (125 loc) · 6.73 KB

RADIOHEAD is a package to fit the model proposed in the manuscript: S Mohammed, K Bharath, S Kurtek, A Rao, V Baladandayuthapani, 2020, RADIOHEAD: Radiogenomic Analysis Incorporating Tumor Heterogeneity in Imaging Through Densities.

Example code to run the RADIOHEAD pipeline. This code only shows an example execution of the model using the package RADIOHEAD.

This document contains the following:

  1. An example to run the RADIOHEAD model using the data available in the package
  2. Pre-processing of the imaging and genomic data a. Computing pathway scores for a gene set b. Pre-processing magnetic resonance imaging scans
# install the package (devtools package needed)
if(!require(devtools)) install.packages("devtools")
devtools::install_github('shariq-mohammed/RADIOHEAD')

Load the package

library(RADIOHEAD)

Example

Note that the data is available within the package.

Choose the pathway score for the 'EXOCYTOSIS' pathway.

y = scale(c_pathway_scores[,"EXOCYTOSIS"], center = T, scale = F)

Choose the principal component scores based on radiomic images.

X = scale(pc_scores, center = T, scale = T)

Choose the group membership labels for the principal components.

groups = pc_groups

Execute the Gibbs sampling.

res = groupSS(y, X, groups, Nmcmc=1000)

Identify the significant radiogenomic associaitons between the pathway and the radiomic features after a Bayesian FDR-based variable selection.

# 'associations' outputs the effect size of the significant associations.
associations = fdr_var_selection(res$b, res$x_cnames)
associations

Pre-processing of the imaging and genomic data

Computing pathway scores for a gene set

Below we demonstrate an example to compute pathway scores from a randomly generated gene expression data set. The same process can be employed to compute the pathway scores for any gene expression data.

Install the GSVA package and dependencies. Load GSEABase and GSVA package.

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
if (!requireNamespace("GSVA", quietly = TRUE)) BiocManager::install("GSVA")
library(GSEABase)
library(GSVA)

Gene sets can be downloaded from the molecular signature database (MSigDB). For example, we obtain the gene membership of the EXOCYTOSIS pathway on MSigDB from here.

gene.set = getGmt("https://www.gsea-msigdb.org/gsea/msigdb/download_geneset.jsp?geneSetName=GO_EXOCYTOSIS&fileType=gmt")

Randomly generate gene expression data for 5 subjects and 1000 genes.

gene.expr = matrix(rnorm(5*1000), ncol=5)
colnames(gene.expr) = paste0('Subject', 1:5)
rownames(gene.expr) = paste0('Gene', 1:1000)

# Set row names as the gene names from the pathway
gene.ids = [email protected][[1]]@geneIds
rownames(gene.expr)[1:length(gene.ids)] = gene.ids

Compute the pathway scores.

pathwayScores = gsva(gene.expr, gene.set, method='gsva', verbose=FALSE)
pathwayScores

These pathway scores can be used as response via the groupSS function to fit the model proposed in RADIOHEAD framework.

Pre-processing magnetic resonance imaging scans

The pre-processing of magnetic resonance imaging (MRI) scans involves the following steps

  • Step 1: WhiteStripe normalization

    • We perform the WhiteStripe normalization using the whitestripe function directly from the package WhiteStripe.
  • Step 2: Obtain tumor voxel values from various groups as defined by different combinations of MRI sequences (T1, T1Gd, T2, FLAIR) and tumor sub-regions (necrosis, edema, and enhancing tumor regions)

  • Step 3: Compute the principal component scores using the Riemannian-geometric framework for each group separately

We demonstrate steps 2 and 3 with simple examples (using randomly generated data for five subjects). The same process can be used for real MRI scans.

Note: Here we directly work with three dimensional arrays. If the actual MRI scans and the tumor segmentation labels are stored as NIfTI files, then the package oro.nifti can be used to import/load the data.

Generate MRI data of array size 10x10x10 for five subjects by randomly sampling 10x10x10 values from a standard normal distribution. Similarly, generate the corresponding tumor segmentation labels for the five subjects. Here the segmentation labels contain four values, where 0 indicates non-tumor region and 1, 2 and 3 indicate tumor sub-regions.

Note: In the following we have not considered multiple MRI sequences. The groups are formed due to the tumor sub-regions. But the same procedure can be repeated for multiple MRI sequences.

n = 5 # number of subjects
for(i in 1:5){
  assign(paste0('mri',i), array(rnorm(10*10*10), dim = c(10,10,10)))
  assign(paste0('tumor',i), array(rep(sample(0:3, 4),c(125,125,125,1000-(3*125))), dim = c(10,10,10)))
}

Step 2

Obtain tumor voxel values from three groups as defined by tumor sub-regions (indexed by 1, 2 and 3).

# sub-region 1
tum.1 = lapply(1:n,
               function(i){
                 mri = eval(parse(text=paste0('mri',i)))
                 tumor = eval(parse(text=paste0('tumor',i)))
                 x = c(mri[tumor==1])
               })

# sub-region 2
tum.2 = lapply(1:n,
               function(i){
                 mri = eval(parse(text=paste0('mri',i)))
                 tumor = eval(parse(text=paste0('tumor',i)))
                 x = c(mri[tumor==2])
               })

# sub-region 3
tum.3 = lapply(1:n,
               function(i){
                 mri = eval(parse(text=paste0('mri',i)))
                 tumor = eval(parse(text=paste0('tumor',i)))
                 x = c(mri[tumor==3])
               })

Step 3

Compute the principal component scores for each group separately using the geomPCA function.

pc.scores.1 = geomPCA(tum.1)
pc.scores.2 = geomPCA(tum.2)
pc.scores.3 = geomPCA(tum.3)

These principal component scores can be augmented together (by column) and used to fit the model (via the groupSS function) proposed by in the RADIOHEAD framework. The group indicators (i.e. a label for each column in the augmented matrix) need to be specified appropriately.

pc.scores = cbind(pc.scores.1, pc.scores.2, pc.scores.3)
pc.groups = rep(c(1,2,3), c(length(pc.scores.1),
                            length(pc.scores.2), length(pc.scores.3)))