A dashboard to analyze transactions on the Ethereum blockchain in real-time. We used the publicly available crypto_ethereum dataset from Bigquery. The data was downloaded and stored locally as csv files.
scala 2.12.15
sbt 1.5.8
Apache Spark 3.2.1
docker
anddocker-compose
-
Clone and
cd
into the repo -
Use
docker-compose up -d
to start all the containers in a detached mode. The configs are defined in thedocker-compose.yml
file. The services will be exposed to the following ports:Zookeeper
(required for kafka): 2181Kafka
: 9092Superset
: 8088Redis
(to enable persistence of our dashboards)Mysql
: 3306
-
Setup Apache-Superset through the following command (this will configure superset and you will be able to connect to it on port 8088:
docker exec -it superset superset-init
-
Use this sequence of commands to start the Producer adn Consumer scripts:
-
sbt assembly
-> This will create the .jar file for the project using the assembly plugin -
Run the consumer using:
-
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1 --master localhost --class "edu.neu.ethanalyzer.StreamingConsumer" ./target/scala-2.12/EthereumAnalytics-assembly-1.0.jar
-
Producer using:
-
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1 --master localhost --class "edu.neu.ethanalyzer.DataProducer" ./target/scala-2.12/EthereumAnalytics-assembly-1.0.jar
-
-
We have also created a BatchConsumer class to analyze the data in totality. Run it using:
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1 --master localhost --class "edu.neu.ethanalyzer.BatchConsumer" ./target/scala-2.12/EthereumAnalytics-assembly-1.0.jar