-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPipeline_README.txt
289 lines (210 loc) · 13.5 KB
/
Pipeline_README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
## STEP 1. Merge/Append [cat] technical replicates
# Oryza sativa sp. Japonica var. Nipponbare
160721_I114_FCHC2FYBBXX_L2_CHKPE85216070025_1_NmNAI_1.fq.gz
160918_I211_FCHCGLCBBXX_L6_CHKPE85216070025_1_NmNAI_1.fq.gz (Forward strand technical replicates) -NAI Bio1
160721_I114_FCHC2FYBBXX_L2_CHKPE85216070025_2_NmNAI_1.fq.gz
160918_I211_FCHCGLCBBXX_L6_CHKPE85216070025_2_NmNAI_1.fq.gz (Reverse strand technical replicates) -NAI Bio1
------------
160721_I114_FCHC2FYBBXX_L2_CHKPE85216070027_1_NpNAI_1.fq.gz
160918_I211_FCHCGLCBBXX_L6_CHKPE85216070027_1_NpNAI_1.fq.gz (Forward strand technical replicates) +NAI Bio1
160721_I114_FCHC2FYBBXX_L2_CHKPE85216070027_2_NpNAI_1.fq.gz
160918_I211_FCHCGLCBBXX_L6_CHKPE85216070027_2_NpNAI_1.fq.gz (Reverse strand technical replicates) +NAI Bio1
------------
160721_I114_FCHC2FYBBXX_L2_CHKPE85216070026_1_NmNAI_2.fq.gz
160918_I211_FCHCGLCBBXX_L6_CHKPE85216070026_1_NmNAI_2.fq.gz (Forward strand technical replicates) -NAI Bio2
160721_I114_FCHC2FYBBXX_L2_CHKPE85216070026_2_NmNAI_2.fq.gz
160918_I211_FCHCGLCBBXX_L6_CHKPE85216070026_2_NmNAI_2.fq.gz (Reverse strand technical replicates) -NAI Bio2
------------
160721_I114_FCHC2FYBBXX_L2_CHKPE85216070028_1_NpNAI_2.fq.gz
160918_I211_FCHCGLCBBXX_L6_CHKPE85216070028_1_NpNAI_2.fq.gz (Forward strand technical replicates) +NAI Bio2
160721_I114_FCHC2FYBBXX_L2_CHKPE85216070028_2_NpNAI_2.fq.gz
160918_I211_FCHCGLCBBXX_L6_CHKPE85216070028_2_NpNAI_2.fq.gz (Reverse strand technical replicates) +NAI Bio2
[cat 160721_I114_FCHC2FYBBXX_L2_CHKPE85216070025_1_NmNAI_1.fq.gz 160918_I211_FCHCGLCBBXX_L6_CHKPE85216070025_1_NmNAI_1.fq.gz > Merged_160721_160918_1_NmNAI_1.fq.gz]
[cat 160721_I114_FCHC2FYBBXX_L2_CHKPE85216070025_2_NmNAI_1.fq.gz 160918_I211_FCHCGLCBBXX_L6_CHKPE85216070025_2_NmNAI_1.fq.gz > Merged_160721_160918_2_NmNAI_1.fq.gz]
[cat 160721_I114_FCHC2FYBBXX_L2_CHKPE85216070027_1_NpNAI_1.fq.gz 160918_I211_FCHCGLCBBXX_L6_CHKPE85216070027_1_NpNAI_1.fq.gz > Merged_160721_160918_1_NpNAI_1.fq.gz]
[cat 160721_I114_FCHC2FYBBXX_L2_CHKPE85216070027_2_NpNAI_1.fq.gz 160918_I211_FCHCGLCBBXX_L6_CHKPE85216070027_2_NpNAI_1.fq.gz > Merged_160721_160918_2_NpNAI_1.fq.gz]
[cat 160721_I114_FCHC2FYBBXX_L2_CHKPE85216070026_1_NmNAI_2.fq.gz 160918_I211_FCHCGLCBBXX_L6_CHKPE85216070026_1_NmNAI_2.fq.gz > Merged_160721_160918_1_NmNAI_2.fq.gz]
[cat 160721_I114_FCHC2FYBBXX_L2_CHKPE85216070026_2_NmNAI_2.fq.gz 160918_I211_FCHCGLCBBXX_L6_CHKPE85216070026_2_NmNAI_2.fq.gz > Merged_160721_160918_2_NmNAI_2.fq.gz]
[cat 160721_I114_FCHC2FYBBXX_L2_CHKPE85216070028_1_NpNAI_2.fq.gz 160918_I211_FCHCGLCBBXX_L6_CHKPE85216070028_1_NpNAI_2.fq.gz > Merged_160721_160918_1_NpNAI_2.fq.gz]
[cat 160721_I114_FCHC2FYBBXX_L2_CHKPE85216070028_2_NpNAI_2.fq.gz 160918_I211_FCHCGLCBBXX_L6_CHKPE85216070028_2_NpNAI_2.fq.gz > Merged_160721_160918_2_NpNAI_2.fq.gz]
Merged_160721_160918_1_NmNAI_1.fq.gz (Merged & compressed forward strand replicates; Nipponbare -NAI Bio1)
Merged_160721_160918_2_NmNAI_1.fq.gz (Merged & compressed reverse strand replicates; Nipponbare -NAI Bio1)
------------
Merged_160721_160918_1_NpNAI_1.fq.gz (Merged & compressed forward strand replicates; Nipponbare +NAI Bio1)
Merged_160721_160918_2_NpNAI_1.fq.gz (Merged & compressed reverse strand replicates; Nipponbare +NAI Bio1)
------------
Merged_160721_160918_1_NmNAI_2.fq.gz (Merged & compressed forward strand replicates; Nipponbare -NAI Bio2)
Merged_160721_160918_2_NmNAI_2.fq.gz (Merged & compressed reverse strand replicates; Nipponbare -NAI Bio2)
------------
Merged_160721_160918_1_NpNAI_2.fq.gz (Merged & compressed forward strand replicates; Nipponbare +NAI Bio2)
Merged_160721_160918_2_NpNAI_2.fq.gz (Merged & compressed reverse strand replicates; Nipponbare +NAI Bio2)
------------
# Oryza sativa sp. Indica var. 9311
161003_I211_FCHCH53BBXX_L1_CHKPE85216090001_1.fq.gz
FCHKMVJBBXX_L2_CHKPE85216090001_1.fq.gz (Forward strand technical replicates) -NAI Bio1
161003_I211_FCHCH53BBXX_L1_CHKPE85216090001_2.fq.gz
FCHKMVJBBXX_L2_CHKPE85216090001_2.fq.gz (Reverse strand technical replicates) -NAI Bio1
-----------------
161003_I211_FCHCH53BBXX_L1_CHKPE85216090003_1.fq.gz
FCHKMVJBBXX_L2_CHKPE85216090003_1.fq.gz (Forward strand technical replicates) +NAI Bio1
161003_I211_FCHCH53BBXX_L1_CHKPE85216090003_2.fq.gz
FCHKMVJBBXX_L2_CHKPE85216090003_2.fq.gz (Reverse strand technical replicates) +NAI Bio1
-----------------
161003_I211_FCHCH53BBXX_L1_CHKPE85216090002_1.fq.gz
FCHKMVJBBXX_L3_CHKPE85216090002_1.fq.gz (Forward strand technical replicates) -NAI Bio2
161003_I211_FCHCH53BBXX_L1_CHKPE85216090002_2.fq.gz
FCHKMVJBBXX_L3_CHKPE85216090002_2.fq.gz (Reverse strand technical replicates) -NAI Bio2
-----------------
161003_I211_FCHCH53BBXX_L1_CHKPE85216090004_1.fq.gz
FCHKMVJBBXX_L3_CHKPE85216090004_1.fq.gz (Forward strand technical replicates) +NAI Bio2
161003_I211_FCHCH53BBXX_L1_CHKPE85216090004_2.fq.gz
FCHKMVJBBXX_L3_CHKPE85216090004_2.fq.gz (Reverse strand technical replicates) +NAI Bio2
-----------------
[cat 161003_I211_FCHCH53BBXX_L1_CHKPE85216090001_1.fq.gz FCHKMVJBBXX_L2_CHKPE85216090001_1.fq.gz > Merged_forwardread_techreps_Bio1.fq.gz]
[cat 161003_I211_FCHCH53BBXX_L1_CHKPE85216090001_2.fq.gz FCHKMVJBBXX_L2_CHKPE85216090001_2.fq.gz > Merged_reverseread_techreps_Bio1.fq.gz]
[cat 161003_I211_FCHCH53BBXX_L1_CHKPE85216090003_1.fq.gz FCHKMVJBBXX_L2_CHKPE85216090003_1.fq.gz > Merged_forwardread_plus_Bio1.fq.gz]
[cat 161003_I211_FCHCH53BBXX_L1_CHKPE85216090003_2.fq.gz FCHKMVJBBXX_L2_CHKPE85216090003_2.fq.gz > Merged_reverseread_plus_Bio1.fq.gz]
[cat 161003_I211_FCHCH53BBXX_L1_CHKPE85216090002_1.fq.gz FCHKMVJBBXX_L3_CHKPE85216090002_1.fq.gz > Merged_forwardread_techreps_Bio2.fq.gz]
[cat 161003_I211_FCHCH53BBXX_L1_CHKPE85216090002_2.fq.gz FCHKMVJBBXX_L3_CHKPE85216090002_2.fq.gz > Merged_reverseread_techreps_Bio2.fq.gz]
[cat 161003_I211_FCHCH53BBXX_L1_CHKPE85216090004_1.fq.gz FCHKMVJBBXX_L3_CHKPE85216090004_1.fq.gz > Merged_forwardread_plus_Bio2.fq.gz]
[cat 161003_I211_FCHCH53BBXX_L1_CHKPE85216090004_2.fq.gz FCHKMVJBBXX_L3_CHKPE85216090004_2.fq.gz > Merged_reverseread_plus_Bio2.fq.gz]
Merged_forwardread_techreps_Bio1.fq.gz (Merged & compressed forward strand replicates; 9311 -NAI Bio1)
Merged_reverseread_techreps_Bio1.fq.gz (Merged & compressed reverse strand replicates; 9311 -NAI Bio1)
------------
Merged_forwardread_plus_Bio1.fq.gz (Merged & compressed forward strand replicates; 9311 +NAI Bio1)
Merged_reverseread_plus_Bio1.fq.gz (Merged & compressed forward strand replicates; 9311 +NAI Bio1)
------------
Merged_forwardread_techreps_Bio2.fq.gz (Merged & compressed forward strand replicates; 9311 -NAI Bio2)
Merged_reverseread_techreps_Bio2.fq.gz (Merged & compressed reverse strand replicates; 9311 -NAI Bio2)
------------
Merged_forwardread_plus_Bio2.fq.gz (Merged & compressed forward strand replicates; 9311 +NAI Bio2)
Merged_reverseread_plus_Bio2.fq.gz (Merged & compressed forward strand replicates; 9311 +NAI Bio2)
------------
## STEP 2. Adapter trimming using Trimmomatic
# Nipponbare
Run script
[sbatch trim_nipponbare.sh]
NmNAI_B1_R1.fastq.gz (NAI minus_BiologicalRep. 1_Forward read)
NmNAI_B1_R2.fastq.gz (NAI minus_BiologicalRep. 1_Reverse read)
NmNAI_B2_R1.fastq.gz (NAI minus_BiologicalRep. 2_Forward read)
NmNAI_B2_R2.fastq.gz (NAI minus_BiologicalRep. 2_Reverse read)
NpNAI_B1_R1.fastq.gz (NAI plus_BiologicalRep. 1_Forward read)
NpNAI_B1_R2.fastq.gz (NAI plus_BiologicalRep. 1_Reverse read)
NpNAI_B2_R1.fastq.gz (NAI plus_BiologicalRep. 2_Forward read)
NpNAI_B2_R2.fastq.gz (NAI plus_BiologicalRep. 2_Reverse read)
# 9311
Run script
[sbatch trim_nipponbare.sh]
9mNAI_B1_forwardread_trimmed.fastq.gz (NAI minus_BiologicalRep. 1_Forward read)
9mNAI_B1_reverseread_trimmed.fastq.gz (NAI minus_BiologicalRep. 1_Reverse read)
9mNAI_B2_forwardread_trimmed.fastq.gz (NAI minus_BiologicalRep. 2_Forward read)
9mNAI_B2_reverseread_trimmed.fastq.gz (NAI minus_BiologicalRep. 2_Reverse read)
9pNAI_B1_forwardread_trimmed.fastq.gz (NAI plus_BiologicalRep. 1_Forward read)
9pNAI_B1_reverseread_trimmed.fastq.gz (NAI plus_BiologicalRep. 1_Reverse read)
9pNAI_B2_forwardread_trimmed.fastq.gz (NAI plus_BiologicalRep. 2_Forward read)
9pNAI_B2_reverseread_trimmed.fastq.gz (NAI plus_BiologicalRep. 2_Reverse read)
The output files *_out_fw_paired.fq
*_out_fw_unpaired.fq
*_out_rev_paired.fq
*_out_rev_unpaired.fq are deleted.
Output files *.fastq.gz
*.fastq.gz are zipped and kept for QC and mapping (Step 3 and 4).
------------
## STEP 3. QC using FastQC
Run script
[sbatch qc_submit_jobs.sh]
------------
## STEP 4. Mapping minus libraries (-NAI) of 9311 to Nipponbare reference genome
9311 has no reference genome (or transcriptome) to map reads of 9311 to,
therefore to generate a reference 9311 genome (through SNP calling on Nipponbare genome and replacement of SNPs),
the minus 9311 libraries (9mNAI_B1 and 9mNAI_B2) are mapped to the Nipponbare genome.
Run script
[sbatch Genomic_mapping_9311_HISAT2.sh]
------------
## STEP 5. SNP calling
Run script
[sbatch GATK_SNPcalling_pipeline.sh]
------------
## STEP 6. Generate Reference Genome & Transcriptome for 9311
(i) Generate reference Genome
Use GATK FastaAlternateReferenceMaker
Run script
[sbatch SLURM_AlternateReferenceMaker.sh]
(ii) Quality check the newly generated 9311 genome by comparing it with Nipponbare reference genome
Check if SNPs are incorporated in their correct positions in the new 9311 genome.
Run script
[perl Genome_slice_QC.pl]
or batch submit separately for each chromosome vcf
[sbatch 9311_AltRefGenome_QC_batchsubmit.sh]
(iii) Generate reference Transcriptome
Run script
[python generate-TranscriptReference-usingAssemblySeq.py]
------------
## STEP 7. Map 9311 reads (+NAI and -NAI) to new 9311 Transcriptome
Use Bowtie2
(i) Create index for transcriptomic mapping with 9311.transcriptome.fasta
Run script
[sbatch index_builder_9311_ref_transcriptome.sh]
(ii) Map (-)NAI and (+)NAI libraries to new reference of 93-11
Run script
[sbatch Transcriptome_mapping_9311_Bowtie2.sh]
------------
## STEP 8. Computing stop counts and coverage
(i) Run script
[sbatch Batch_compute_stop_counts_multimapping.sh]
output is a .cnt file which has (i) Transcript id, (ii) stopcounts and (ii) coverage at each nucleotide position.
(ii) Summarise the stopcounts and coverage and normalise by transcript length
using the .cnt file obtained from running the first step
Run script
[perl stopcounts_coverage_fraction.pl input.cnt > output.cvg]
------------
# STEP 9. Calculate SHAPE reactivity
(i) Merge (sum) the counts from two biological replicates.
activate rnaenv
rna_structure counts -l 9mNAI_B1.cnt 9mNAI_B2.cnt 9mNAI_merged.cnt
rna_structure counts -l 9pNAI_B1.cnt 9pNAI_B2.cnt 9pNAI_merged.cnt
(ii) Calculate the box plot normalized reactivity
rna_structure_cli shape-reactivity -v --calc=log-noalpha --norm=boxplot-noq3 --norm-exclude-zeroes --out-filter=pos --cut-nts=-40 9311_combinedREF.fasta 9mNAI_merged.cnt 9pNAI_merged.cnt ./Boxplot_react/ trans > transcriptome_bpnorm.log 2>&1
(iii) Calculate 95% Winsorization reactivity of passed files
Run script
[perl Rescale_reactivity_Winsorize.pl Raw_reactivity_pass.list Rescaled_react_pass_directory/]
------------
# STEP 10. Fold RNA secondary structures
(i) Fold Nipponbare an 9311 transcripts
SHAPE constrained folding (invivo folding) and unconstrained folding (insilico)
Provide input list of file names without extensions (fasta shape files must have the same names) to fold
Run script
[sbatch Fold_RNA.sh file.list.inp]
(ii) Annotate basepairing partners
Identify pairing partners later to calculate basepairing distances and probabilities
Provide list of name_ss.ps files from RNAfold as input
Run script
[sbatch Relplot.sh RNAfold_output_files_ss.ps.list]
(iii) Extract basepairing probabilities
Provide input files, which are obtained from running Relplot.
Run script
[perl Calc_BasePairProbability.pl path_to_fold_rss.ps_files.list]
------------
# STEP 11. Compare RNA secondary structures
(i) PPV
------------
# STEP 12. Calculate nucleotide conservation score
The multiallelic data of 32 Million SNP positions in 3000 (3K) rice accessions was downloaded from
https://s3.amazonaws.com/3kricegenome/reduced/3k_RG_32mio_All_multiallelic_biallelic_SNP_dataset.zip
OR the header ("3K RG 32mio SNPs, called vs Nipponbare MSU7/IRGSP1.0 genome, tabular format") in http://snp-seek.irri.org/_download.zul
(i) Prepare mapping of rice IRIS_UNIQUE_IDS to Variety Group (ex. Indica, Japonica, etc.)
Download rice Variety Group information of 3K rice accessions from
"The 3,000 rice genomes project"
Additional file 1: Table S1A:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4035669/#S1
Extract columns C ("DNA_UNIQUE_ID") and O ("Variety Group (Tree)1") in sheet "Table-1A_IRRI" &
columns C ("DNA_UNIQUE_ID") and N ("Variety Group (Tree)2") in sheet "Table-1B_CAAS"
concatenate the coloumns into a file.
Run script
[Map_3KRice_accession_variety_name.R]
(ii) Compute conservation scores of nucleotides
Run script
[sbatch SLURM_get_SNPconservation_3Kgenomes.sh]
(iii) Map nucleotide conservation score from genomic cordinates to transcript cordinates
Run script
[sbatch SLURM_map_nuclconservationscores_genomic_to_transcriptomic_coords.sh]