Spark-Alluxio-HDFS benchmarks

This folder contains a synthetic benchmark aimed to check performance of the cluster in several scenarios. It's based on DFSIO benchmark for HDFS and adapted to work in this environment using Spark.

These benchmarks were run on an Openshift cluster with 7 worker nodes. Each worker stack was composed of a Spark worker instance, an Alluxio worker and an HDFS datanode. (TODO explain cluster capacity and topology)

Each datanode had replication factor configured to 3x.

Scenarios

The scenarios are designed to write and read from/to alluxio in several configurations of caching and file size and number. The resources available for Spark are also taken into account to measure the effect of concurrency and paralelization. So far the benchmarks defined are the following:

TestDFSIO

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Spark-Alluxio-HDFS benchmarks

Scenarios

Files

README.md

Latest commit

History

README.md

File metadata and controls

Spark-Alluxio-HDFS benchmarks

Scenarios