This repo contains the code for loading data to Mattermost's data warehouse, performing transformations and exporting data to other tools.
.
├── airflow <--- Airflow related code.
│ ├── dags <--- DAGs executed by airflow.
│ ├── plugins <--- Custom airflow plugins used with DAGs.
│ └── tests <--- Tests for dags and plugins.
├── docs <--- Extra documentation.
├── extract <--- Python scripts that extract data from various locations.
│ └── s3_extract <--- Various utilities for importing data from S3 to Snowflake
├── load
│ └── snowflake <--- Snowflake role definitions
├── README.md <--- This file
├── poetry.lock <--- Pinned dependency versions
├── pyproject.toml <--- PEP-518 build system requirements
├── requirements.txt <--- Dependencies (deprecated)
├── tests <--- Unit tests for python code.
├── transform
│ ├── snowflake-dbt <--- Snowflake DBT models.
│ ├── mattermost-analytics <--- New DBT project for Mattermost analytics.
│ └── sql <--- SQL scripts that get executed by DAGs
└── utils <--- Various Python scripts
Tools for extracting data
- from S3 to Snowflake and
- from Snowflake to Postgres.
All extractions are executed using Airflow on a Kubernetes cluster.
- DBT project for running transformations on Snowflake. The DBT project runs on DBT cloud.
- SQL scripts for NPS feedback. The scripts are executed by an Airflow DAG.
Snowflake role definitions. An Airflow DAG runs the update.
Trigger building invoice for subscriptions. Triggered via an Airflow DAG.
A set of Python scripts performing custom ETLs. The utilities run as part of Airflow DAGs.
Airflow DAGs that orchestrate ETL pipelines.
Please see DBT setup instructions for setting up DBT and for performing common operations.
DBT development guidelines contains instructions about the DBT development environment, as well as for common development operations.
Run the following commands:
poetry install
Additional dependencies can be specified at pyproject.toml
. Poetry's documentation
provides examples. Please prefer using poetry
CLI, as it also updates poetry.lock
file and "pins" any new dependencies.
Note that currently there's a
requirements.txt
file. This file will be deprecated.
Snowflake connection details can be configured by adding the proper environment variables. Each role requires a different set of environment variables. The following table describes the required environment variables for each role:
Role | Environment variable | Description |
---|---|---|
SYSADMIN | SNOWFLAKE_USER | Username |
SNOWFLAKE_PASSWORD | Password | |
SNOWFLAKE_ACCOUNT | Snowflake account to connect to | |
SNOWFLAKE_LOAD_DATABASE | Database to load data to | |
SNOWFLAKE_LOAD_WAREHOUSE | Warehouse to load data to | |
ANALYTICS_LOADER | SNOWFLAKE_LOAD_USER | Username |
SNOWFLAKE_LOAD_PASSWORD | Password | |
SNOWFLAKE_ACCOUNT | Snowflake account to connect to | |
SNOWFLAKE_TRANSFORM_DATABASE | Database to load data to | |
SNOWFLAKE_LOAD_WAREHOUSE | Warehouse to load data to | |
LOADER | SNOWFLAKE_LOAD_USER | Username |
SNOWFLAKE_LOAD_PASSWORD | Password | |
SNOWFLAKE_ACCOUNT | Snowflake account to connect to | |
SNOWFLAKE_LOAD_DATABASE | Database to load data to | |
SNOWFLAKE_LOAD_WAREHOUSE | Warehouse to load data to | |
TRANSFORMER | SNOWFLAKE_USER | Username |
SNOWFLAKE_PASSWORD | Password | |
SNOWFLAKE_ACCOUNT | Snowflake account to connect to | |
SNOWFLAKE_TRANSFORM_DATABASE | Database to use for transforming data | |
SNOWFLAKE_TRANSFORM_WAREHOUSE | Warehouse to store transformed data to | |
PERMISSIONS | PERMISSION_BOT_USER | Username |
PERMISSION_BOT_PASSWORD | Password | |
PERMISSION_BOT_ACCOUNT | Snowflake account to connect to | |
PERMISSION_BOT_DATABASE | Database to use for transforming data | |
PERMISSION_BOT_WAREHOUSE | Warehouse to store transformed data to | |
RELEASE_LOCATION | Location to load release data from |
Please see the separate documentation file.
This product includes GeoLite2 data created by MaxMind, available from https://www.maxmind.com.