This GitHub Action enables column-aware CI for dbt Cloud Enterprise accounts by leveraging dbt Cloud's column-level lineage feature. It intelligently determines which downstream models need to be rebuilt based on column-level changes in your dbt models.
https://www.loom.com/share/c9ddaaa259b8413c9ab09adb670fd996?sid=c03617bd-2743-43c8-bb68-c2e9502a0a2b
Traditional dbt CI runs rebuild all downstream dependencies when a model changes. This action optimizes CI runs by:
- Analyzing which columns have changed in modified models
- Using dbt Cloud's column-level lineage to identify affected downstream models
- Excluding unaffected downstream models from the CI run
This results in faster CI runs and more efficient use of warehouse resources.
- Faster CI Runs: Only rebuild models that are actually impacted by column changes
- Resource Optimization: Reduce warehouse costs by skipping unnecessary model runs
- Enhanced Developer Experience: Get faster feedback on your PRs
- Enterprise Integration: Seamlessly works with dbt Cloud Enterprise features
- dbt Cloud Enterprise account
- A
dbt docs generate
command should be run in at least one job in your environment that the CI job defers to. This is what enables column-level lineage. More info here. - A dbt Cloud Personal access token
- A dbt Cloud Service token with the following permissions:
Permission | Usage |
---|---|
Metadata | Used to return column-level lineage and compiled code |
Job Runner | Used to trigger the CI job configured in the workflow |
Job Viewer | Used to infer the deferring environment ID if not given as part of the workflow inputs |
Input | Description | Required | Default |
---|---|---|---|
dbt_cloud_account_id |
dbt Cloud Account ID | Yes | - |
dbt_cloud_job_id |
dbt Cloud CI Job ID for the current project | Yes | - |
dbt_cloud_service_token |
dbt Cloud Service Token | Yes | - |
dbt_cloud_token_name |
Name of the personal API Key created in dbt Cloud | Yes | - |
dbt_cloud_token_value |
dbt Cloud Personal API Key for use with the dbt Cloud CLI | Yes | - |
dialect |
SQL dialect of your warehouse (e.g., 'snowflake') | Yes | - |
dbt_cloud_host |
dbt Cloud host | No | cloud.getdbt.com |
dry_run |
When true, analyzes changes but doesn't trigger dbt Cloud job | No | false |
github_token |
GitHub token for API authentication | No | ${{ github.token }} |
log_level |
Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL) | No | INFO |
The dialect refers to your data platform where your dbt project is being executed. Valid dialects include:
- athena
- bigquery
- databricks
- postgres
- redshift
- snowflake
- spark
- trino
Here's an example workflow that uses this action:
name: dbt Cloud CI
on:
pull_request:
branches: [ main ]
jobs:
dbt-cloud-ci:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run Column-Aware dbt Cloud CI
uses: dpguthrie/[email protected]
with:
dbt_cloud_service_token: ${{ secrets.DBT_CLOUD_SERVICE_TOKEN }}
dbt_cloud_token_name: 'github-actions'
dbt_cloud_token_value: ${{ secrets.DBT_CLOUD_TOKEN_VALUE }}
dbt_cloud_account_id: '12345'
dbt_cloud_job_id: '98765'
dialect: 'snowflake'
log_level: 'DEBUG' # optional
To get the most out of column-aware CI, it's recommended to set up your dbt Cloud environment as follows:
- Create a merge job in your environment that the CI job defers to. This will run when PRs are merged to your configured branch.
- Configure the merge job to run at minimum:
This step is crucial because it:
dbt docs generate
- Recalculates column-level lineage information
- Generates an updated
manifest.json
file - Ensures accurate state comparison for subsequent CI runs using
state:modified
The merge job keeps your deferred environment's state current, which enables this action to accurately determine which models need to be rebuilt based on column-level changes.
- The action identifies modified models using dbt's state comparison
- For each modified model, it:
- Compiles the current and previous compiled code. The current code is retrieved from running a
dbt compile
and the previous code is retrieved from the discovery API. - Create a diff between the current and previous code. Specifically, looking for changes to columns
- Queries dbt Cloud's Discovery API to find impacted downstream models from the columns that had changes
- Compiles the current and previous compiled code. The current code is retrieved from running a
- Creates a filtered CI run that excludes unaffected downstream models
- Monitors the job run and reports status back to GitHub
See example of what the flow looks like below:
The following examples demonstrate different types of schema changes and their impact:
Adding a new column to a table is considered a non-breaking change, which means no models downstream need to be run
Modifying an existing column is a breaking change that could impact downstream models. Only models referencing that modified column will be run as part of CI.
Modifying a where clause has the potential to break any models downstream of that change, so nothing downstream of this model will be excluded.
- Manages application configuration through the
Config
class - Handles dbt Cloud credentials and settings
- Provides environment variable parsing via
from_env()
- Sets up logging configuration
- Initializes the application configuration
- Creates and runs the CI orchestrator
- Handles top-level error handling
Node
: Represents a dbt model with source and target codeNodeFactory
: Creates Node instances from raw dataNodeManager
: Manages collections of nodes and their dependenciesBreakingChange
: Analyzes SQL changes to detect breaking modificationsColumnTracker
: Tracks column-level changes across models
CiOrchestrator
: Coordinates the entire CI workflowDbtRunner
: Handles dbt CLI command executionDiscoveryClient
: Interfaces with dbt Cloud's Discovery APILineageService
: Manages model lineage information
- Defines protocol classes for key components
- Ensures consistent implementation across services
- Includes protocols for:
DbtRunnerProtocol
DiscoveryClientProtocol
LineageServiceProtocol
OrchestratorProtocol
- Contains helper functions for:
- Creating dbt Cloud profiles
- Triggering dbt Cloud jobs
- Managing job run statuses
- Defines GraphQL queries for the Discovery API
- Includes queries for:
- Column lineage
- Compiled code
- Node lineage
- Sets up standardized logging across the application
- Configures console output formatting
- Defines log levels and handlers
- Only been tested with Snowflake
- Assumes that your column names are not case sensitive.
- The dbt Cloud CLI is used to run dbt commands
compile
andls
, which means that it needs a personal access token and is at the moment scoped to a particular user. The job itself that is triggered at the end of the workflow would still use the credentials configured for the enviroment it's running in. - The
favor-state
flag is used when compiling the target SQL. This is done to try and minimize any changes that are picked up solely because of environment separation (e.g. db.my_dev_schema.dim_customers vs. db.my_prod_schema.dim_customers). However, this doesn’t apply if the node is also part of the selected nodes. See example below when runningdbt compile -s state:modified --favor-state
:
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the terms of the MIT license.