Data Engineering Zoomcamp — a free nine-week course covering fundamentals of data engineering

Learn and practice each part of the data engineering process and apply your acquired knowledge and skills to develop an end-to-end data pipeline from the ground up.

Register on Slack • Join the #course-data-engineering Slack channel • Telegram Announcements Channel • Course Playlist • Frequently Asked Questions (FAQ)

How to Take Data Engineering Zoomcamp Course

2025 Cohort

Start Date: 13 January 2025
Registration Link: Sign up here
Materials: Cohort-specific materials

Self-Paced Mode

The course materials are open for self-paced learning. Simply follow the suggested syllabus week by week:

Start watching the videos.
Join the Slack community.
Refer to the FAQ document for common issues.

Syllabus

Overview

This course consists of modules, workshops, and a project that helps you apply the concepts and tools learned during the course. The syllabus is structured to guide you step-by-step through the world of data engineering.

Prerequisites

To get the most out of this course, you should feel comfortable with coding and the command line and know the basics of SQL. Prior experience with Python will be helpful, but you can pick Python is relatively fast if you have experience with other programming languages.

Prior experience with data engineering is not required.

Detailed Syllabus

We encourage Learning in Public

Note: NYC TLC changed the format of the data we use to parquet. In the course we still use the CSV files accessible here.

Module 1: Containerization and Infrastructure as Code

Course overview
Introduction to GCP
Docker and docker-compose
Running Postgres locally with Docker
Setting up infrastructure on GCP with Terraform
Preparing the environment for the course
Homework

More details

Module 2: Workflow Orchestration

Data Lake
Workflow orchestration
Workflow orchestration with Kestra
Homework

More details

Workshop 1: Data Ingestion

Reading from apis
Building scalable pipelines
Normalising data
Incremental loading
Homework

More details

Module 3: Data Warehouse

Data Warehouse
BigQuery
Partitioning and clustering
BigQuery best practices
Internals of BigQuery
BigQuery Machine Learning

More details

Module 4: Analytics engineering

Basics of analytics engineering
dbt (data build tool)
BigQuery and dbt
Postgres and dbt
dbt models
Testing and documenting
Deployment to the cloud and locally
Visualizing the data with google data studio and metabase

More details

Module 5: Batch processing

Batch processing
What is Spark
Spark Dataframes
Spark SQL
Internals: GroupBy and joins

More details

Module 6: Streaming

Introduction to Kafka
Schemas (avro)
Kafka Streams
Kafka Connect and KSQL

More details

Project

Putting everything we learned to practice

Week 1 and 2: working on your project
Week 3: reviewing your peers

More details

Instructors

Past instructors:

Asking for help in Slack

The best way to get support is to use DataTalks.Club's Slack. Join the #course-data-engineering channel.

To make discussions in Slack more organized:

Follow these recommendations when asking for help
Read the DataTalks.Club community guidelines

Supporters and partners

Thanks to the course sponsors for making it possible to run this course

Do you want to support our course and our community? Please reach out to [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 945 Commits
01-docker-terraform		01-docker-terraform
02-workflow-orchestration		02-workflow-orchestration
03-data-warehouse		03-data-warehouse
04-analytics-engineering		04-analytics-engineering
05-batch		05-batch
06-streaming		06-streaming
cohorts		cohorts
images		images
projects		projects
.gitignore		.gitignore
README.md		README.md
after-sign-up.md		after-sign-up.md
asking-questions.md		asking-questions.md
awesome-data-engineering.md		awesome-data-engineering.md
certificates.md		certificates.md
dataset.md		dataset.md
learning-in-public.md		learning-in-public.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering Zoomcamp — a free nine-week course covering fundamentals of data engineering

How to Take Data Engineering Zoomcamp Course

2025 Cohort

Self-Paced Mode

Syllabus

Overview

Table of Contents

Prerequisites

Detailed Syllabus

Module 1: Containerization and Infrastructure as Code

Module 2: Workflow Orchestration

Workshop 1: Data Ingestion

Module 3: Data Warehouse

Module 4: Analytics engineering

Module 5: Batch processing

Module 6: Streaming

Project

Instructors

Asking for help in Slack

Supporters and partners

About

Releases

Packages

Contributors 144

Languages

DataTalksClub/data-engineering-zoomcamp

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Zoomcamp — a free nine-week course covering fundamentals of data engineering

How to Take Data Engineering Zoomcamp Course

2025 Cohort

Self-Paced Mode

Syllabus

Overview

Table of Contents

Prerequisites

Detailed Syllabus

Instructors

Asking for help in Slack

Supporters and partners

About

Topics

Resources

Stars

Watchers

Forks

Languages