Skip to content

Commit

Permalink
Merge branch 'main' into fix-static-copy-partitions
Browse files Browse the repository at this point in the history
  • Loading branch information
colin-rogers-dbt authored Oct 10, 2024
2 parents 8ebbbde + 455c768 commit 32960cc
Show file tree
Hide file tree
Showing 35 changed files with 263 additions and 9 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 1.9.0a1
current_version = 1.9.0b1
parse = (?P<major>[\d]+) # major version number
\.(?P<minor>[\d]+) # minor version number
\.(?P<patch>[\d]+) # patch version number
Expand Down
44 changes: 44 additions & 0 deletions .changes/1.9.0-b1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
## dbt-bigquery 1.9.0-b1 - October 02, 2024

### Features

- Add configuration options `enable_list_inference` and `intermediate_format` for python models ([#1047](https://github.com/dbt-labs/dbt-bigquery/issues/1047), [#1114](https://github.com/dbt-labs/dbt-bigquery/issues/1114))
- Add tests for cross-database `cast` macro ([#1214](https://github.com/dbt-labs/dbt-bigquery/issues/1214))
- Cross-database `date` macro ([#1221](https://github.com/dbt-labs/dbt-bigquery/issues/1221))
- Add support for base 64 encoded json keyfile credentials ([#923](https://github.com/dbt-labs/dbt-bigquery/issues/923))
- Add support for cancelling queries on keyboard interrupt ([#917](https://github.com/dbt-labs/dbt-bigquery/issues/917))
- Add Microbatch Strategy to dbt-spark ([#1354](https://github.com/dbt-labs/dbt-bigquery/issues/1354))

### Fixes

- Drop intermediate objects created in BigQuery for incremental models ([#1036](https://github.com/dbt-labs/dbt-bigquery/issues/1036))
- Fix null column index issue during `dbt docs generate` for external tables ([#1079](https://github.com/dbt-labs/dbt-bigquery/issues/1079))
- make seed delimiter configurable via `field_delimeter` in model config ([#1119](https://github.com/dbt-labs/dbt-bigquery/issues/1119))
- Default `enableListInference` to `True` for python models to support nested lists ([#1047](https://github.com/dbt-labs/dbt-bigquery/issues/1047), [#1114](https://github.com/dbt-labs/dbt-bigquery/issues/1114))
- Catch additional database error exception, NotFound, as a DbtDatabaseError instead of defaulting to a DbtRuntimeError ([#1360](https://github.com/dbt-labs/dbt-bigquery/issues/1360))

### Under the Hood

- Lazy load `agate` ([#1162](https://github.com/dbt-labs/dbt-bigquery/issues/1162))
- Simplify linting environment and dev dependencies ([#1291](https://github.com/dbt-labs/dbt-bigquery/issues/1291))

### Dependencies

- Update pre-commit requirement from ~=3.5 to ~=3.7 ([#1052](https://github.com/dbt-labs/dbt-bigquery/pull/1052))
- Update freezegun requirement from ~=1.3 to ~=1.4 ([#1062](https://github.com/dbt-labs/dbt-bigquery/pull/1062))
- Bump mypy from 1.7.1 to 1.8.0 ([#1064](https://github.com/dbt-labs/dbt-bigquery/pull/1064))
- Update flake8 requirement from ~=6.1 to ~=7.0 ([#1069](https://github.com/dbt-labs/dbt-bigquery/pull/1069))
- Bump actions/download-artifact from 3 to 4 ([#1209](https://github.com/dbt-labs/dbt-bigquery/pull/1209))
- Bump actions/upload-artifact from 3 to 4 ([#1210](https://github.com/dbt-labs/dbt-bigquery/pull/1210))
- Bump ubuntu from 22.04 to 24.04 in /docker ([#1247](https://github.com/dbt-labs/dbt-bigquery/pull/1247))
- Update pre-commit-hooks requirement from ~=4.5 to ~=4.6 ([#1281](https://github.com/dbt-labs/dbt-bigquery/pull/1281))
- Update pytest-xdist requirement from ~=3.5 to ~=3.6 ([#1282](https://github.com/dbt-labs/dbt-bigquery/pull/1282))
- Update flaky requirement from ~=3.7 to ~=3.8 ([#1283](https://github.com/dbt-labs/dbt-bigquery/pull/1283))
- Update twine requirement from ~=4.0 to ~=5.1 ([#1293](https://github.com/dbt-labs/dbt-bigquery/pull/1293))

### Contributors
- [@d-cole](https://github.com/d-cole) ([#917](https://github.com/dbt-labs/dbt-bigquery/issues/917))
- [@dwreeves](https://github.com/dwreeves) ([#1162](https://github.com/dbt-labs/dbt-bigquery/issues/1162))
- [@robeleb1](https://github.com/robeleb1) ([#923](https://github.com/dbt-labs/dbt-bigquery/issues/923))
- [@salimmoulouel](https://github.com/salimmoulouel) ([#1119](https://github.com/dbt-labs/dbt-bigquery/issues/1119))
- [@vinit2107](https://github.com/vinit2107) ([#1036](https://github.com/dbt-labs/dbt-bigquery/issues/1036))
6 changes: 6 additions & 0 deletions .changes/1.9.0/Features-20240925-232238.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
kind: Features
body: Add Microbatch Strategy to dbt-spark
time: 2024-09-25T23:22:38.216277+01:00
custom:
Author: michelleark
Issue: "1354"
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
7 changes: 7 additions & 0 deletions .changes/1.9.0/Fixes-20241001-193207.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
kind: Fixes
body: Catch additional database error exception, NotFound, as a DbtDatabaseError instead
of defaulting to a DbtRuntimeError
time: 2024-10-01T19:32:07.304353-04:00
custom:
Author: mikealfare
Issue: "1360"
1 change: 1 addition & 0 deletions .github/workflows/integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ jobs:
- 'tests/**'
- 'dev-requirements.txt'
- '.github/**'
- '*.py'
- name: Generate integration test matrix
id: generate-matrix
Expand Down
48 changes: 46 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,54 @@
- "Breaking changes" listed under a version may require action from end users or external maintainers when upgrading to that version.
- Do not edit this file directly. This file is auto-generated using [changie](https://github.com/miniscruff/changie). For details on how to document a change, see [the contributing guide](https://github.com/dbt-labs/dbt-bigquery/blob/main/CONTRIBUTING.md#adding-changelog-entry)

## dbt-bigquery 1.9.0-b1 - October 02, 2024

### Features

- Add configuration options `enable_list_inference` and `intermediate_format` for python models ([#1047](https://github.com/dbt-labs/dbt-bigquery/issues/1047), [#1114](https://github.com/dbt-labs/dbt-bigquery/issues/1114))
- Add tests for cross-database `cast` macro ([#1214](https://github.com/dbt-labs/dbt-bigquery/issues/1214))
- Cross-database `date` macro ([#1221](https://github.com/dbt-labs/dbt-bigquery/issues/1221))
- Add support for base 64 encoded json keyfile credentials ([#923](https://github.com/dbt-labs/dbt-bigquery/issues/923))
- Add support for cancelling queries on keyboard interrupt ([#917](https://github.com/dbt-labs/dbt-bigquery/issues/917))
- Add Microbatch Strategy to dbt-spark ([#1354](https://github.com/dbt-labs/dbt-bigquery/issues/1354))

### Fixes

- Drop intermediate objects created in BigQuery for incremental models ([#1036](https://github.com/dbt-labs/dbt-bigquery/issues/1036))
- Fix null column index issue during `dbt docs generate` for external tables ([#1079](https://github.com/dbt-labs/dbt-bigquery/issues/1079))
- make seed delimiter configurable via `field_delimeter` in model config ([#1119](https://github.com/dbt-labs/dbt-bigquery/issues/1119))
- Default `enableListInference` to `True` for python models to support nested lists ([#1047](https://github.com/dbt-labs/dbt-bigquery/issues/1047), [#1114](https://github.com/dbt-labs/dbt-bigquery/issues/1114))
- Catch additional database error exception, NotFound, as a DbtDatabaseError instead of defaulting to a DbtRuntimeError ([#1360](https://github.com/dbt-labs/dbt-bigquery/issues/1360))

### Under the Hood

- Lazy load `agate` ([#1162](https://github.com/dbt-labs/dbt-bigquery/issues/1162))
- Simplify linting environment and dev dependencies ([#1291](https://github.com/dbt-labs/dbt-bigquery/issues/1291))

### Dependencies

- Update pre-commit requirement from ~=3.5 to ~=3.7 ([#1052](https://github.com/dbt-labs/dbt-bigquery/pull/1052))
- Update freezegun requirement from ~=1.3 to ~=1.4 ([#1062](https://github.com/dbt-labs/dbt-bigquery/pull/1062))
- Bump mypy from 1.7.1 to 1.8.0 ([#1064](https://github.com/dbt-labs/dbt-bigquery/pull/1064))
- Update flake8 requirement from ~=6.1 to ~=7.0 ([#1069](https://github.com/dbt-labs/dbt-bigquery/pull/1069))
- Bump actions/download-artifact from 3 to 4 ([#1209](https://github.com/dbt-labs/dbt-bigquery/pull/1209))
- Bump actions/upload-artifact from 3 to 4 ([#1210](https://github.com/dbt-labs/dbt-bigquery/pull/1210))
- Bump ubuntu from 22.04 to 24.04 in /docker ([#1247](https://github.com/dbt-labs/dbt-bigquery/pull/1247))
- Update pre-commit-hooks requirement from ~=4.5 to ~=4.6 ([#1281](https://github.com/dbt-labs/dbt-bigquery/pull/1281))
- Update pytest-xdist requirement from ~=3.5 to ~=3.6 ([#1282](https://github.com/dbt-labs/dbt-bigquery/pull/1282))
- Update flaky requirement from ~=3.7 to ~=3.8 ([#1283](https://github.com/dbt-labs/dbt-bigquery/pull/1283))
- Update twine requirement from ~=4.0 to ~=5.1 ([#1293](https://github.com/dbt-labs/dbt-bigquery/pull/1293))

### Contributors
- [@d-cole](https://github.com/d-cole) ([#917](https://github.com/dbt-labs/dbt-bigquery/issues/917))
- [@dwreeves](https://github.com/dwreeves) ([#1162](https://github.com/dbt-labs/dbt-bigquery/issues/1162))
- [@robeleb1](https://github.com/robeleb1) ([#923](https://github.com/dbt-labs/dbt-bigquery/issues/923))
- [@salimmoulouel](https://github.com/salimmoulouel) ([#1119](https://github.com/dbt-labs/dbt-bigquery/issues/1119))
- [@vinit2107](https://github.com/vinit2107) ([#1036](https://github.com/dbt-labs/dbt-bigquery/issues/1036))


## Previous Releases
For information on prior major and minor releases, see their changelogs:
- [1.8](https://github.com/dbt-labs/dbt-bigquery/blob/1.8.latest/CHANGELOG.md)
- [1.7](https://github.com/dbt-labs/dbt-bigquery/blob/1.7.latest/CHANGELOG.md)
- [1.6](https://github.com/dbt-labs/dbt-bigquery/blob/1.6.latest/CHANGELOG.md)
- [1.5](https://github.com/dbt-labs/dbt-bigquery/blob/1.5.latest/CHANGELOG.md)
- [1.4](https://github.com/dbt-labs/dbt-bigquery/blob/1.4.latest/CHANGELOG.md)
Expand Down
2 changes: 1 addition & 1 deletion dbt/adapters/bigquery/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
version = "1.9.0a1"
version = "1.9.0b1"
4 changes: 4 additions & 0 deletions dbt/adapters/bigquery/connections.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,10 @@ def exception_handler(self, sql):
message = "Access denied while running query"
self.handle_error(e, message)

except google.cloud.exceptions.NotFound as e:
message = "Not found while running query"
self.handle_error(e, message)

except google.auth.exceptions.RefreshError as e:
message = (
"Unable to generate access token, if you're using "
Expand Down
15 changes: 12 additions & 3 deletions dbt/include/bigquery/macros/materializations/incremental.sql
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,16 @@

{% set invalid_strategy_msg -%}
Invalid incremental strategy provided: {{ strategy }}
Expected one of: 'merge', 'insert_overwrite'
Expected one of: 'merge', 'insert_overwrite', 'microbatch'
{%- endset %}
{% if strategy not in ['merge', 'insert_overwrite'] %}
{% if strategy not in ['merge', 'insert_overwrite', 'microbatch'] %}
{% do exceptions.raise_compiler_error(invalid_strategy_msg) %}
{% endif %}

{% if strategy == 'microbatch' %}
{% do bq_validate_microbatch_config(config) %}
{% endif %}

{% do return(strategy) %}
{% endmacro %}

Expand Down Expand Up @@ -48,8 +52,13 @@
tmp_relation, target_relation, sql, unique_key, partition_by, partitions, dest_columns, tmp_relation_exists, copy_partitions
) %}

{% else %} {# strategy == 'merge' #}
{% elif strategy == 'microbatch' %}

{% set build_sql = bq_generate_microbatch_build_sql(
tmp_relation, target_relation, sql, unique_key, partition_by, partitions, dest_columns, tmp_relation_exists, copy_partitions
) %}

{% else %} {# strategy == 'merge' #}
{% set build_sql = bq_generate_incremental_merge_build_sql(
tmp_relation, target_relation, sql, unique_key, partition_by, dest_columns, tmp_relation_exists, incremental_predicates
) %}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{% macro bq_validate_microbatch_config(config) %}
{% if config.get("partition_by") is none %}
{% set missing_partition_msg -%}
The 'microbatch' strategy requires a `partition_by` config.
{%- endset %}
{% do exceptions.raise_compiler_error(missing_partition_msg) %}
{% endif %}

{% if config.get("partition_by").granularity != config.get('batch_size') %}
{% set invalid_partition_by_granularity_msg -%}
The 'microbatch' strategy requires a `partition_by` config with the same granularity as its configured `batch_size`.
Got:
`batch_size`: {{ config.get('batch_size') }}
`partition_by.granularity`: {{ config.get("partition_by").granularity }}
{%- endset %}
{% do exceptions.raise_compiler_error(invalid_partition_by_granularity_msg) %}
{% endif %}
{% endmacro %}

{% macro bq_generate_microbatch_build_sql(
tmp_relation, target_relation, sql, unique_key, partition_by, partitions, dest_columns, tmp_relation_exists, copy_partitions
) %}
{% set build_sql = bq_insert_overwrite_sql(
tmp_relation, target_relation, sql, unique_key, partition_by, partitions, dest_columns, tmp_relation_exists, copy_partitions
) %}

{{ return(build_sql) }}
{% endmacro %}
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,8 @@ def _dbt_bigquery_version() -> str:
packages=find_namespace_packages(include=["dbt", "dbt.*"]),
include_package_data=True,
install_requires=[
"dbt-common>=1.0.4,<2.0",
"dbt-adapters>=1.1.1,<2.0",
"dbt-common>=1.10,<2.0",
"dbt-adapters>=1.7,<2.0",
# 3.20 introduced pyarrow>=3.0 under the `pandas` extra
"google-cloud-bigquery[pandas]>=3.0,<4.0",
"google-cloud-storage~=2.4",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -601,3 +601,59 @@
select * from data
""".lstrip()

microbatch_model_no_unique_id_sql = """
{{ config(
materialized='incremental',
incremental_strategy='microbatch',
partition_by={
'field': 'event_time',
'data_type': 'timestamp',
'granularity': 'day'
},
event_time='event_time',
batch_size='day',
begin=modules.datetime.datetime(2020, 1, 1, 0, 0, 0)
)
}}
select * from {{ ref('input_model') }}
"""

microbatch_input_sql = """
{{ config(materialized='table', event_time='event_time') }}
select 1 as id, TIMESTAMP '2020-01-01 00:00:00-0' as event_time
union all
select 2 as id, TIMESTAMP '2020-01-02 00:00:00-0' as event_time
union all
select 3 as id, TIMESTAMP '2020-01-03 00:00:00-0' as event_time
"""

microbatch_model_no_partition_by_sql = """
{{ config(
materialized='incremental',
incremental_strategy='microbatch',
event_time='event_time',
batch_size='day',
begin=modules.datetime.datetime(2020, 1, 1, 0, 0, 0)
)
}}
select * from {{ ref('input_model') }}
"""


microbatch_model_invalid_partition_by_sql = """
{{ config(
materialized='incremental',
incremental_strategy='microbatch',
event_time='event_time',
batch_size='day',
begin=modules.datetime.datetime(2020, 1, 1, 0, 0, 0),
partition_by={
'field': 'event_time',
'data_type': 'timestamp',
'granularity': 'hour'
}
)
}}
select * from {{ ref('input_model') }}
"""
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import os
import pytest
from unittest import mock

from dbt.tests.util import run_dbt_and_capture
from dbt.tests.adapter.incremental.test_incremental_microbatch import (
BaseMicrobatch,
patch_microbatch_end_time,
)

from tests.functional.adapter.incremental.incremental_strategy_fixtures import (
microbatch_model_no_unique_id_sql,
microbatch_input_sql,
microbatch_model_no_partition_by_sql,
microbatch_model_invalid_partition_by_sql,
)


class TestBigQueryMicrobatch(BaseMicrobatch):
@pytest.fixture(scope="class")
def microbatch_model_sql(self) -> str:
return microbatch_model_no_unique_id_sql


class TestBigQueryMicrobatchMissingPartitionBy:
@pytest.fixture(scope="class")
def models(self) -> str:
return {
"microbatch.sql": microbatch_model_no_partition_by_sql,
"input_model.sql": microbatch_input_sql,
}

@mock.patch.dict(os.environ, {"DBT_EXPERIMENTAL_MICROBATCH": "True"})
def test_execution_failure_no_partition_by(self, project):
with patch_microbatch_end_time("2020-01-03 13:57:00"):
_, stdout = run_dbt_and_capture(["run"], expect_pass=False)
assert "The 'microbatch' strategy requires a `partition_by` config" in stdout


class TestBigQueryMicrobatchInvalidPartitionByGranularity:
@pytest.fixture(scope="class")
def models(self) -> str:
return {
"microbatch.sql": microbatch_model_invalid_partition_by_sql,
"input_model.sql": microbatch_input_sql,
}

@mock.patch.dict(os.environ, {"DBT_EXPERIMENTAL_MICROBATCH": "True"})
def test_execution_failure_no_partition_by(self, project):
with patch_microbatch_end_time("2020-01-03 13:57:00"):
_, stdout = run_dbt_and_capture(["run"], expect_pass=False)
assert (
"The 'microbatch' strategy requires a `partition_by` config with the same granularity as its configured `batch_size`"
in stdout
)

0 comments on commit 32960cc

Please sign in to comment.