GitHub - guangyu-yang-rokt/spark-fires: Spark fires is a anti-pattern playground where we deliberately break Spark applications in various ways so you can observe what happens and potentially recognise the issue when you come across it in your day-to-day development and support activities.

Spark-fires - we set fire to Spark apps so you don't have to!

Spark fires is a anti-pattern playground where we deliberately break Spark applications in various ways so you can observe what happens and potentially recognise the issue when you come across it in your day-to-day development and support activities.

We plan to cover all the common scenarios you might hit in production, technical interview questions and a lot more.

Scenarios

The Spark-fires playground is scenario-based. Each scenario is documented and run via a Jupyter notebook - so you can step through it, see the impact of different fixes, try different settings yourself, all while viewing the application behaviour in the Spark UI.

Bootstrapping

For ease of use, the project is self-contained and has a Docker Compose file capable of starting a local Spark cluster with three workers.

Docker Requirements

The default cluster configuration will start three Spark Worker nodes with 2 cores and 2G memory each. If this is too much for your machine feel free to tweak as needed. Note, the pre-baked scenarios will work best with the default configuration provided.

Roll-your-own

Alternatively, if you prefer you can download Spark directly, configure as desired and start the cluster components manually.

Starting the cluster

The Spark cluster configuration is defined in the Docker Compose file here - docker-compose.yaml.

The Spark cluster can be started using the following command from the repo root directory:

docker compose up

Note, this will take a while the first time, as it will need to download the container images, etc. After that, it will only take a few seconds.

Cluster UIs

Once started the key cluster UIs should be available at:

Spark Master - http://localhost:8080/#/
Spark UI (once an app has been started) - http://localhost:4040/jobs/
Jupyter Lab - you can grab the URL or token from the docker compose output in your terminal (you may have to scroll up a little!)
- http://127.0.0.1:8888/lab?token=<grab your token/URL from the startup logs>

Available scenarios

New scenarios are arriving in the coming weeks.

Currently available scenarios are:

Accompanying videos

Coming soon, if folk show some interest!

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark-fires - we set fire to Spark apps so you don't have to!

Scenarios

Bootstrapping

Docker Requirements

Roll-your-own

Starting the cluster

Cluster UIs

Available scenarios

Accompanying videos

About

Releases

Packages

Languages

License

guangyu-yang-rokt/spark-fires

Folders and files

Latest commit

History

Repository files navigation

Spark-fires - we set fire to Spark apps so you don't have to!

Scenarios

Bootstrapping

Docker Requirements

Roll-your-own

Starting the cluster

Cluster UIs

Available scenarios

Accompanying videos

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages