[Regression] Python model build failed when Google bucket retention policy is set (starting from dbt version 1.7) #1318

sherminsb · 2024-08-05T17:56:21Z

Is this a new bug in dbt-core?

I believe this is a new bug in dbt-core
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Starting dbt version 1.7, when bucket retention policy is set, python model build will throw error Unhandled error while executing target/run/restream_bi/models/python/python_model_test.py[0m Google Cloud Dataproc Agent reports job failure.

Expected Behavior

In dbt version1.6, the python model does not get a build error even when when bucket retention policy is set.

Steps To Reproduce

Change dbt version to 1.7 or later in Dev env
Set a value to bucket retention policy in bucket details in Google Cloud
Run dbt run -s <python model>

Relevant log output

Google Cloud Dataproc Agent reports job failure.

Using the default container image
Waiting for container log creation
PYSPARK_PYTHON=/opt/dataproc/conda/bin/python
JAVA_HOME=/usr/lib/jvm/temurin-11-jdk-amd64
SPARK_EXTRA_CLASSPATH=
:: loading settings :: file = /etc/spark/conf/ivysettings.xml
/usr/lib/spark/python/lib/pyspark.zip/pyspark/pandas/__init__.py:49: UserWarning: 'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to set this environment variable to '1' in both driver and executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it does not work if there is a Spark context already launched.

Environment

- OS:macOS
- Python/dbt: 
Error in dbt cloud version 1.7 and versionless
OK in dbt cloud version 1.6

Which database adapter are you using with dbt?

bigquery

Additional Context

No response

The text was updated successfully, but these errors were encountered:

colin-rogers-dbt · 2024-10-28T23:59:08Z

The root cause here is a change we made in 1.7 to use "indirect" writes instead of writing directly to Bigquery. This allows us to support a wider range of Bigquery functionality (namely writing to partitioned tables).
See BQ connector docs for more info. Note that this means using a bucket retention policy will be incompatible with using partitioned/incremental materialization strategies.

I think the right behavior here is to try and use the direct writes for basic tables materializations and only use indirect if the user sets a partition config

sherminsb added bug Something isn't working triage labels Aug 5, 2024

sherminsb changed the title ~~[Bug] Python models failed when Google bucket retention policy is set (starting from dbt version 1.7)~~ [Bug] Python model build failed when Google bucket retention policy is set (starting from dbt version 1.7) Aug 5, 2024

dbeatty10 added the regression label Aug 5, 2024

dbeatty10 changed the title ~~[Bug] Python model build failed when Google bucket retention policy is set (starting from dbt version 1.7)~~ [Regression] Python model build failed when Google bucket retention policy is set (starting from dbt version 1.7) Aug 5, 2024

dbeatty10 transferred this issue from dbt-labs/dbt-core Aug 5, 2024

amychen1776 added python Pull requests that update Python code python_models and removed triage python Pull requests that update Python code labels Aug 27, 2024

colin-rogers-dbt self-assigned this Oct 11, 2024

colin-rogers-dbt mentioned this issue Oct 29, 2024

use "direct" write for non-partitioned python model materializations #1388

Merged

4 tasks

colin-rogers-dbt closed this as completed in #1388 Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Regression] Python model build failed when Google bucket retention policy is set (starting from dbt version 1.7) #1318

[Regression] Python model build failed when Google bucket retention policy is set (starting from dbt version 1.7) #1318

sherminsb commented Aug 5, 2024 •

edited

Loading

colin-rogers-dbt commented Oct 28, 2024

[Regression] Python model build failed when Google bucket retention policy is set (starting from dbt version 1.7) #1318

[Regression] Python model build failed when Google bucket retention policy is set (starting from dbt version 1.7) #1318

Comments

sherminsb commented Aug 5, 2024 • edited Loading

Is this a new bug in dbt-core?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Which database adapter are you using with dbt?

Additional Context

colin-rogers-dbt commented Oct 28, 2024

sherminsb commented Aug 5, 2024 •

edited

Loading