Skip to content

Ultimate guide for mastering Spark Performance Tuning and Optimization concepts and for preparing for Data Engineering interviews

Notifications You must be signed in to change notification settings

kuldeep27396/spark-experiments

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark Performance Tuning

This repository is the ultimate guide for mastering advanced Spark Performance Tuning and Optimization concepts and for anyone preparing for Data Engineering Interviews involving Spark. Additionally, this repository serves as a reference for all the code snippets used in my Spark Performance Tuning Playlist on YouTube. The goal of the playlist and the accompanying code snippets is to make complex concepts in Apache Spark easy to understand, while also developing a deep understanding of how things work under the hood.

Concepts Covered

Concept YouTube Link Code
Spark Query Plans YouTube Python
Spark DAGs YouTube Python
Spark Memory Management YouTube
Spark Executor Tuning YouTube
Shuffle Partitions YouTube
Data Partitioning YouTube Python
Bucketing YouTube Python
Caching YouTube Python
Data Skew YouTube Python
Salting YouTube Python
AQE & Broadcast Joins YouTube Python
Dynamic Partition Pruning YouTube Python

Contact

For any questions or feedback, feel free to reach out:

About

Ultimate guide for mastering Spark Performance Tuning and Optimization concepts and for preparing for Data Engineering interviews

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%