Data-Engineering-Nanodegree

Solution for Udacity's Data Engineering Nanodegree projects.

Data Modeling with Postgres: This project consists of creating fact and dimension tables for a star schema and writing an ETL pipeline that transfers data from files in two local directories to these tables in Postgres using Python and SQL.
Data Modeling with Apache Cassandra: To complete the project it was necessary to model part of the ETL pipeline that transfers data from a set of CSV files into a directory to create a simplified CSV file for modeling and inserting data into Apache Cassandra tables.
Data Warehouse with Redshift: In this project the task was to load data from S3 into the test tables in Redshift and execute SQL statements that create the analytic tables from these test tables.
Data Lake with Apache Spark: In this project, the learnings in Apache Spark and data lakes were applied to build an ETL pipeline for a data lake hosted on S3. To complete the project, data was loaded from S3, processed in analytical tables using Spark and loaded back into S3. This Spark process was deployed in a cluster using AWS.
Data Pipeline with Airflow: To complete this project, the main concepts of Apache Airflow were applied, such as creating custom operators to perform the tasks of preparing the data, populating the data warehouse and performing checks on the data.
Capstone Project: This project consists of building an ETL pipeline that uses I94 immigration and temperature data to create a database optimized for analyzing immigration events. And the fact table will be used to answer whether the temperature of cities is decisive for the choice of destination by immigrants.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Capstone Project		Capstone Project
Data Lake - Apache Spark		Data Lake - Apache Spark
Data Modeling - Apache Cassandra		Data Modeling - Apache Cassandra
Data Modeling - PostgreSQL		Data Modeling - PostgreSQL
Data Pipeline - Airflow		Data Pipeline - Airflow
Data Warehouse - Redshift		Data Warehouse - Redshift
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Engineering-Nanodegree

About

Releases

Packages

Languages

mateuskiper/Data-Engineering-Nanodegree

Folders and files

Latest commit

History

Repository files navigation

Data-Engineering-Nanodegree

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages