Skip to content

Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.

Notifications You must be signed in to change notification settings

DataTalksClub/data-engineering-zoomcamp

Repository files navigation

Data Engineering Zoomcamp Overview

Data Engineering Zoomcamp — a free nine-week course covering fundamentals of data engineering

Learn and practice each part of the data engineering process and apply your acquired knowledge and skills to develop an end-to-end data pipeline from the ground up.

Register on SlackJoin the #course-data-engineering Slack channelTelegram Announcements ChannelCourse PlaylistFrequently Asked Questions (FAQ)

How to Take Data Engineering Zoomcamp Course

2025 Cohort

Self-Paced Mode

The course materials are open for self-paced learning. Simply follow the suggested syllabus week by week:

  1. Start watching the videos.
  2. Join the Slack community.
  3. Refer to the FAQ document for common issues.

Syllabus

Overview

This course consists of modules, workshops, and a project that helps you apply the concepts and tools learned during the course. The syllabus is structured to guide you step-by-step through the world of data engineering.

Table of Contents

Prerequisites

To get the most out of this course, you should feel comfortable with coding and the command line and know the basics of SQL. Prior experience with Python will be helpful, but you can pick Python is relatively fast if you have experience with other programming languages.

Prior experience with data engineering is not required.

Detailed Syllabus

We encourage Learning in Public

Note: NYC TLC changed the format of the data we use to parquet. In the course we still use the CSV files accessible here.

  • Course overview
  • Introduction to GCP
  • Docker and docker-compose
  • Running Postgres locally with Docker
  • Setting up infrastructure on GCP with Terraform
  • Preparing the environment for the course
  • Homework

More details

  • Data Lake
  • Workflow orchestration
  • Workflow orchestration with Kestra
  • Homework

More details

  • Reading from apis
  • Building scalable pipelines
  • Normalising data
  • Incremental loading
  • Homework

More details

  • Data Warehouse
  • BigQuery
  • Partitioning and clustering
  • BigQuery best practices
  • Internals of BigQuery
  • BigQuery Machine Learning

More details

  • Basics of analytics engineering
  • dbt (data build tool)
  • BigQuery and dbt
  • Postgres and dbt
  • dbt models
  • Testing and documenting
  • Deployment to the cloud and locally
  • Visualizing the data with google data studio and metabase

More details

  • Batch processing
  • What is Spark
  • Spark Dataframes
  • Spark SQL
  • Internals: GroupBy and joins

More details

  • Introduction to Kafka
  • Schemas (avro)
  • Kafka Streams
  • Kafka Connect and KSQL

More details

Putting everything we learned to practice

  • Week 1 and 2: working on your project
  • Week 3: reviewing your peers

More details

Instructors

Past instructors:

Asking for help in Slack

The best way to get support is to use DataTalks.Club's Slack. Join the #course-data-engineering channel.

To make discussions in Slack more organized:

Supporters and partners

Thanks to the course sponsors for making it possible to run this course

Do you want to support our course and our community? Please reach out to [email protected]

About

Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages