-
Notifications
You must be signed in to change notification settings - Fork 40
Benchmark
In this document we presents the running time and space usage by individual pipeline of piPipes
.
All the runnings were performed on Massachusetts Green High Performance Computing Cluster using piPipes
commit ab50e8a2fae33edefcb7749e95cbf54a600c1c50
.
We randomly sampled N
millions of reads from an unpublished HiSeq SE50 small RNA-seq library with 27,990,838 reads
and ran piPipes
small RNA pipeline with 8 CPUs.
for i in `seq 1 3 26`; do
seqtk sample -s$((RANDOM%100)) $SMALLRNA_FQ ${i}000000 | \
gzip > ${i}M.fq.gz && \
date +"%m-%d-%k-%M" > ${i}.time && \
piPipes small \
-i ${i}M.fq.gz \
-g dm3 \
-o ${i}M.out && \
date +"%m-%d-%k-%M" >> ${i}.time && \
du -skh ${i}M.out > ${i}.size && \
rm -rf ${i}M.out ${i}M.fq.gz
done
We randomly sampled N
millions of reads from an unpublished HiSeq PE100 RNA-seq library with 15,963,640 pairs and
ran piPipes
RNA-seq pipeline with 8 CPUs.
for i in `seq 1 15`; do
SEED=$((RANDOM%100)) && \
seqtk sample -s $SEED $RNA_FQ1 ${i}000000 | \
gzip > ${i}M.r1.fq.gz && \
seqtk sample -s $SEED $RNA_FQ2 ${i}000000 | \
gzip > ${i}M.r2.fq.gz && \
date +"%m-%d-%k-%M" > ${i}.time && \
piPipes rna \
-l ${i}M.r1.fq.gz \
-r ${i}M.r2.fq.gz \
-g dm3 \
-o ${i}M.out && \
date +"%m-%d-%k-%M" >> ${i}.time && \
du -skh ${i}M.out > ${i}.size && \
rm -rf ${i}M.out ${i}M.r1.fq.gz ${i}M.r2.fq.gz
done
We randomly sampled N
millions of reads from an unpublished HiSeq PE100 Degradome-seq library with 15,963,640 pairs.
Then we ran piPipes
Degradome-seq pipeline with 8 CPUs, and with small RNA library that has 23,712,713 genome-mappable reads
(small RNA-seq data wasn't sampled).
for i in `seq 1 7`; do
SEED=$((RANDOM%100)) && \
seqtk sample -s$SEED $DEG_FQ1 ${i}000000 | \
gzip > ${i}M.r1.fq.gz && \
seqtk sample -s$SEED $DEG_FQ2 ${i}000000 | \
gzip > ${i}M.r2.fq.gz && \
date +"%m-%d-%k-%M" > ${i}.time && \
piPipes deg \
-l ${i}M.r1.fq.gz \
-r ${i}M.r2.fq.gz \
-g dm3 \
-o ${i}M.out \
-s $SMALL_RNA_OUTPUT && \
date +"%m-%d-%k-%M" >> ${i}.time && \
du -skh ${i}M.out > ${i}.size && \
rm -rf ${i}M.out ${i}M.r1.fq.gz ${i}M.r2.fq.gz
done
We randomly sampled N
millions of reads from an unpublished HiSeq PE100 ChIP-seq library with 17,980,776/180,11,302 pairs for INPUT and IP.
Then we ran piPipes
ChIP-seq pipeline with 8 CPUs.
for i in `seq 2 2 10`; do
SEED=$((RANDOM%100)) && \
seqtk sample -s$SEED $CHIP_INPUT_1 ${i}000000 | \
gzip > ${i}M.input.r1.fq.gz && \
seqtk sample -s$SEED $CHIP_INPUT_2 ${i}000000 | \
gzip > ${i}M.input.r2.fq.gz && \
seqtk sample -s$SEED $CHIP_IP_1 ${i}000000 | \
gzip > ${i}M.IP.r1.fq.gz && \
seqtk sample -s$SEED $CHIP_IP_2 ${i}000000 | \
gzip > ${i}M.IP.r2.fq.gz && \
date +"%m-%d-%k-%M" > ${i}.time && \
piPipes chip \
-l ${i}M.IP.r1.fq.gz \
-r ${i}M.IP.r2.fq.gz \
-L ${i}M.input.r1.fq.gz \
-R ${i}M.input.r2.fq.gz \
-g dm3 \
-o ${i}M.out && \
date +"%m-%d-%k-%M" >> ${i}.time && \
du -skh ${i}M.out > ${i}.size && \
rm -rf ${i}M.input.r1.fq.gz ${i}M.input.r2.fq.gz ${i}M.IP.r1.fq.gz ${i}M.IP.r2.fq.gz ${i}M.out
done
We randomly sampled N
millions of reads from a published (SRR333512) HiSeq PE100 Genome-seq library with 18,042,217 reads and
ran piPipes
Genome-seq pipeline with 8 CPUs and without running mrfast
and VariationHunter
.
for i in `seq 2 2 10`; do
SEED=$((RANDOM%100)) && \
seqtk sample -s$SEED $GENOME_FQ1 ${i}000000 | \
gzip > ${i}M.r1.fq.gz && \
seqtk sample -s$SEED $GENOME_FQ2 ${i}000000 | \
gzip > ${i}M.r2.fq.gz && \
date +"%m-%d-%k-%M" > ${i}.time && \
piPipes dna \
-l ${i}M.r1.fq.gz \
-r ${i}M.r2.fq.gz \
-g dm3 \
-o ${i}M.out && \
date +"%m-%d-%k-%M" >> ${i}.time && \
du -skh ${i}M.out > ${i}.size && \
rm -rf ${i}M.out ${i}M.r1.fq.gz ${i}M.r2.fq.gz
done