-
Notifications
You must be signed in to change notification settings - Fork 12
HW7
Which one of the three aligner/mapper (STAR, hisat2 and RapMap) we used for RNASeq mapping can only align reads to transcriptome? (1 point)
Which one tends to generate higher number of multiple alignments than the other two? (See the flagstat results). (2 points)
Get DESeq2 package installed and loaded into your computer. Run the following commands in R and take the output as your answer. (1 points)
search()
sessionInfo()
In our lab, we tested the effect of two factors (phenotype and stress) on expression levels. Each factor has 4 different levels(See experimental information: https://github.com/mestato/epp622/blob/master/RNA_labs_data/experimental_info.csv). Below is the result from a Wald test.
log2 fold change (MAP): stress saline vs ABA
Wald test p-value: stress saline vs ABA
DataFrame with 19453 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
AT1G01010 0.9246572 0.04106919 1.1664635 0.03520829 0.97191365 NA
AT1G01020 0.7345512 -0.50228525 1.1997038 -0.41867437 0.67545413 NA
AT1G01030 0.4918562 0.53399011 1.2159565 0.43915231 0.66055118 NA
AT1G01040 4.1881473 -0.88294298 0.7686287 -1.14872497 0.25066941 0.5074783
AT1G01050 12.9203449 0.95553494 0.4842900 1.97306342 0.04848834 0.1769671
... ... ... ... ... ... ...
ATMG01350 0.1237857 -0.6264714 0.9607553 -0.6520613 0.5143615962 NA
ATMG01360 0.4759696 1.1442281 1.2182994 0.9392011 0.3476275109 NA
ATMG01370 0.1996883 -0.4199930 1.0363781 -0.4052508 0.6852931724 NA
ATMG01380 0.1976165 0.3782464 1.0249796 0.3690282 0.7121066974 NA
ATMG01390 235.3293034 0.8171322 0.2399232 3.4058071 0.0006596878 0.00617547
Which two levels from which factor are we comparing (1 point)?
Which level is considered as the control (untreated) group in the test? (1 point)
For the expression level of gene ATMG01390, which statement below is true? (2 points)
- A. gene ATMG01390 has higher expression under saline stress than under ABA stress
- B. gene ATMG01390 has higher expression under ABA stress than under saline stress
- C. No significant difference in expression level between this two types of stress.
If you want to test the interaction effect between phenotype and stress with the Likelihood Ratio Test, what is your full model and what is your reduced model? (2 points)
Replace FULL MODEL and REDUCED MODEL in the script below with your answers.
dds = DESeqDataSetFromMatrix(countData = countData,
colData = colData,
design = FULL MODEL)
dds = DESeq(dds, test="LRT", reduced = REDUCED MODEL)
Write a command line to get the top 10 highly expressed genes from the count file DRR016125_STAR_ct
(you may need the commands: awk
, sort
, and tail
).
Run this command line to get the count data:
wget https://github.com/mestato/epp622/raw/master/RNA_labs_data/DRR016125_STAR_ct
Try the command line below. The output might give you more hints.
awk '{print $1"\t"$2"\t"$1}' DRR016125_STAR_ct | head
Your output should look like this:
26 AT2G11240
34 AT2G34410
45 AT5G64572
68 AT3G16130
71 AT1G51402
73 AT5G01595
101 AT5G19150
102 AT3G22121
174 AT1G72600
195 AT5G02370