We will use a dataset from a previous EBI training courses. This data is derived from sequencing of mRNA from zebrafish embryos in two different developmental stages. Sequencing was performed on the Illumina platform and generated 76bp paired-end sequence data using poly-(A)+ selected RNA.
Download data from ftp://ftp.ebi.ac.uk/pub/training/Train_online/RNA-seq_exercise/
- 2cells_1.fastq
- 2cells_2.fastq
- 6h_post_fertilisation_R1.fastq
- 6h_post_fertilisation_R2.fastq
We will use RNAseq data from FlyAtlas2 database, which collects hundreds of RNAseq data of drosophila melanogaster. You can search by gene, category or tissue. Here we downloaded 4 samples (female_head x 2, female_midgut x 2).
We usually download the reference data from ensemble. You search "drosophila" and choose DNA / cDNA / gtf, then you use a wget
to download.
-
drosophila genome: ftp://ftp.ensembl.org/pub/release-97/fasta/drosophila_melanogaster/dna/Drosophila_melanogaster.BDGP6.22.dna.toplevel.fa.gz
-
drosophila transcriptome: ftp://ftp.ensembl.org/pub/release-97/fasta/drosophila_melanogaster/cdna/Drosophila_melanogaster.BDGP6.22.cdna.all.fa.gz
-
drosophila gtf: ftp://ftp.ensembl.org/pub/release-97/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.22.97.chr.gtf.gz
Pre-computed index files: download here
[1] Weill Cornell Medical Colledge: http://chagall.med.cornell.edu/RNASEQcourse/
- Bioconda starting from 3:30.
- fastQC manual
- fastp manual