Skip to content

Benchmark

bowhan edited this page Apr 12, 2016 · 9 revisions

Benchmark of piPipes

In this document we presents the running time and space usage by individual pipeline of piPipes.

Details

All the runnings were performed on Massachusetts Green High Performance Computing Cluster using piPipes commit ab50e8a2fae33edefcb7749e95cbf54a600c1c50.

Small RNA-seq

We randomly sampled N millions of reads from an unpublished HiSeq SE50 small RNA-seq library with 27,990,838 reads and ran piPipes small RNA pipeline with 8 CPUs.

for i in `seq 1 3 26`; do
    seqtk sample -s$((RANDOM%100)) $SMALLRNA_FQ ${i}000000 | \
        gzip > ${i}M.fq.gz && \
        date +"%m-%d-%k-%M" > ${i}.time && \
    piPipes small \
        -i ${i}M.fq.gz \
        -g dm3 \
        -o ${i}M.out && \
    date +"%m-%d-%k-%M" >> ${i}.time && \
    du -skh ${i}M.out > ${i}.size && \
    rm -rf ${i}M.out ${i}M.fq.gz
done

RNA-seq

We randomly sampled N millions of reads from an unpublished HiSeq PE100 RNA-seq library with 15,963,640 pairs and ran piPipes RNA-seq pipeline with 8 CPUs.

for i in `seq 1 15`; do
    SEED=$((RANDOM%100)) && \
    seqtk sample -s $SEED $RNA_FQ1 ${i}000000 | \
        gzip > ${i}M.r1.fq.gz && \
    seqtk sample -s $SEED $RNA_FQ2 ${i}000000 | \
        gzip > ${i}M.r2.fq.gz && \
    date +"%m-%d-%k-%M" > ${i}.time && \
    piPipes rna \
        -l ${i}M.r1.fq.gz \
        -r ${i}M.r2.fq.gz \
        -g dm3 \
        -o ${i}M.out && \
    date +"%m-%d-%k-%M" >> ${i}.time && \
    du -skh  ${i}M.out > ${i}.size && \
    rm -rf ${i}M.out ${i}M.r1.fq.gz ${i}M.r2.fq.gz
done

Degradome-seq

We randomly sampled N millions of reads from an unpublished HiSeq PE100 Degradome-seq library with 15,963,640 pairs. Then we ran piPipes Degradome-seq pipeline with 8 CPUs, and with small RNA library that has 23,712,713 genome-mappable reads (small RNA-seq data wasn't sampled).

for i in `seq 1 7`; do
    SEED=$((RANDOM%100)) && \
    seqtk sample -s$SEED $DEG_FQ1 ${i}000000 | \
        gzip > ${i}M.r1.fq.gz && \
    seqtk sample -s$SEED $DEG_FQ2 ${i}000000 | \
        gzip > ${i}M.r2.fq.gz && \
    date +"%m-%d-%k-%M" > ${i}.time && \
    piPipes deg \
        -l ${i}M.r1.fq.gz \
        -r ${i}M.r2.fq.gz \
        -g dm3 \
        -o ${i}M.out \
        -s $SMALL_RNA_OUTPUT && \
    date +"%m-%d-%k-%M" >> ${i}.time && \
    du -skh  ${i}M.out > ${i}.size && \
    rm -rf ${i}M.out ${i}M.r1.fq.gz ${i}M.r2.fq.gz
done

ChIP-seq

We randomly sampled N millions of reads from an unpublished HiSeq PE100 ChIP-seq library with 17,980,776/180,11,302 pairs for INPUT and IP. Then we ran piPipes ChIP-seq pipeline with 8 CPUs.

for i in `seq 2 2 10`; do
    SEED=$((RANDOM%100)) && \
    seqtk sample -s$SEED $CHIP_INPUT_1 ${i}000000 | \
        gzip > ${i}M.input.r1.fq.gz && \
    seqtk sample -s$SEED $CHIP_INPUT_2 ${i}000000 | \
        gzip > ${i}M.input.r2.fq.gz && \
    seqtk sample -s$SEED $CHIP_IP_1 ${i}000000 | \
        gzip > ${i}M.IP.r1.fq.gz && \
    seqtk sample -s$SEED $CHIP_IP_2 ${i}000000 | \
        gzip > ${i}M.IP.r2.fq.gz && \
  date +"%m-%d-%k-%M" > ${i}.time && \
  piPipes chip \
    -l ${i}M.IP.r1.fq.gz \
    -r ${i}M.IP.r2.fq.gz \
    -L ${i}M.input.r1.fq.gz \
    -R ${i}M.input.r2.fq.gz \
    -g dm3 \
    -o ${i}M.out && \
  date +"%m-%d-%k-%M" >> ${i}.time && \
  du -skh  ${i}M.out > ${i}.size && \
  rm -rf ${i}M.input.r1.fq.gz ${i}M.input.r2.fq.gz ${i}M.IP.r1.fq.gz ${i}M.IP.r2.fq.gz ${i}M.out
  done

Genome-seq

We randomly sampled N millions of reads from a published (SRR333512) HiSeq PE100 Genome-seq library with 18,042,217 reads and ran piPipes Genome-seq pipeline with 8 CPUs and without running mrfast and VariationHunter.

for i in `seq 2 2 10`; do
    SEED=$((RANDOM%100)) && \
    seqtk sample -s$SEED $GENOME_FQ1 ${i}000000 | \
        gzip > ${i}M.r1.fq.gz && \
    seqtk sample -s$SEED $GENOME_FQ2 ${i}000000 | \
        gzip > ${i}M.r2.fq.gz && \
    date +"%m-%d-%k-%M" > ${i}.time && \
    piPipes dna \
        -l ${i}M.r1.fq.gz \
        -r ${i}M.r2.fq.gz \
        -g dm3 \
        -o ${i}M.out && \
    date +"%m-%d-%k-%M" >> ${i}.time && \
    du -skh  ${i}M.out > ${i}.size && \
    rm -rf ${i}M.out ${i}M.r1.fq.gz ${i}M.r2.fq.gz
done

Clone this wiki locally