Skip to content
Ming Chen edited this page Oct 12, 2016 · 18 revisions

Problem 1

Which one of the three aligner/mapper (STAR, hisat2 and RapMap) we used for RNASeq mapping can only align reads to transcriptome? (1 point)

Which one tends to generate higher number of multiple alignments than the other two? (See the flagstat results). (2 points)

Problem 2

Get DESeq2 package installed and loaded into your computer. Run the following commands in R and take the output as your answer. (1 points)

search()
sessionInfo()

In our lab, we tested the effect of two factors (phenotype and stress) on expression levels. Each factor has 4 different levels(See experimental information: https://github.com/mestato/epp622/blob/master/RNA_labs_data/experimental_info.csv). Below is the result from a Wald test.

log2 fold change (MAP): stress saline vs ABA 
Wald test p-value: stress saline vs ABA 
DataFrame with 19453 rows and 6 columns
             baseMean log2FoldChange     lfcSE        stat       pvalue       padj
            <numeric>      <numeric> <numeric>   <numeric>    <numeric>  <numeric>
AT1G01010   0.9246572     0.04106919 1.1664635  0.03520829   0.97191365         NA
AT1G01020   0.7345512    -0.50228525 1.1997038 -0.41867437   0.67545413         NA
AT1G01030   0.4918562     0.53399011 1.2159565  0.43915231   0.66055118         NA
AT1G01040   4.1881473    -0.88294298 0.7686287 -1.14872497   0.25066941  0.5074783
AT1G01050  12.9203449     0.95553494 0.4842900  1.97306342   0.04848834  0.1769671
...               ...            ...       ...         ...          ...        ...
ATMG01350   0.1237857     -0.6264714 0.9607553  -0.6520613 0.5143615962         NA
ATMG01360   0.4759696      1.1442281 1.2182994   0.9392011 0.3476275109         NA
ATMG01370   0.1996883     -0.4199930 1.0363781  -0.4052508 0.6852931724         NA
ATMG01380   0.1976165      0.3782464 1.0249796   0.3690282 0.7121066974         NA
ATMG01390 235.3293034      0.8171322 0.2399232   3.4058071 0.0006596878 0.00617547

Problem 3

Which two levels from which factor are we comparing (1 point)?

Which level is considered as the control (untreated) group in the test? (1 point)

Problem 4

For the expression level of gene ATMG01390, which statement below is true? (2 points)

  • A. gene ATMG01390 has higher expression under saline stress than under ABA stress
  • B. gene ATMG01390 has higher expression under ABA stress than under saline stress
  • C. No significant difference in expression level between this two types of stress.

Problem 5

If you want to test the interaction effect between phenotype and stress with the Likelihood Ratio Test, what is your full model and what is your reduced model? (2 points)

Replace FULL MODEL and REDUCED MODEL in the script below with your answers.

dds = DESeqDataSetFromMatrix(countData = countData,
                             colData = colData,
                             design = FULL MODEL)
dds = DESeq(dds, test="LRT", reduced = REDUCED MODEL)

Extra Credit (5 points)

Write a command line to get the top 10 highly expressed genes from the count file DRR016125_STAR_ct (you may need the commands: awk, sort, and tail).

Run this command line to get the count data:

  • wget https://github.com/mestato/epp622/raw/master/RNA_labs_data/DRR016125_STAR_ct

Try the command line below. The output might give you more hints.

awk '{print $1"\t"$2"\t"$1}' DRR016125_STAR_ct | head

Your output should look like this:

26	AT2G11240
34	AT2G34410
45	AT5G64572
68	AT3G16130
71	AT1G51402
73	AT5G01595
101	AT5G19150
102	AT3G22121
174	AT1G72600
195	AT5G02370
Clone this wiki locally