Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Regression] Python model build failed when Google bucket retention policy is set (starting from dbt version 1.7) #1318

Closed
2 tasks done
sherminsb opened this issue Aug 5, 2024 · 1 comment · Fixed by #1388
Assignees
Labels
bug Something isn't working python_models regression

Comments

@sherminsb
Copy link

sherminsb commented Aug 5, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Starting dbt version 1.7, when bucket retention policy is set, python model build will throw error Unhandled error while executing target/run/restream_bi/models/python/python_model_test.py[0m Google Cloud Dataproc Agent reports job failure.

Expected Behavior

In dbt version1.6, the python model does not get a build error even when when bucket retention policy is set.

Steps To Reproduce

  1. Change dbt version to 1.7 or later in Dev env
  2. Set a value to bucket retention policy in bucket details in Google Cloud
  3. Run dbt run -s <python model>

Relevant log output

Google Cloud Dataproc Agent reports job failure.

Using the default container image
Waiting for container log creation
PYSPARK_PYTHON=/opt/dataproc/conda/bin/python
JAVA_HOME=/usr/lib/jvm/temurin-11-jdk-amd64
SPARK_EXTRA_CLASSPATH=
:: loading settings :: file = /etc/spark/conf/ivysettings.xml
/usr/lib/spark/python/lib/pyspark.zip/pyspark/pandas/__init__.py:49: UserWarning: 'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to set this environment variable to '1' in both driver and executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it does not work if there is a Spark context already launched.

Environment

- OS:macOS
- Python/dbt: 
Error in dbt cloud version 1.7 and versionless
OK in dbt cloud version 1.6

Which database adapter are you using with dbt?

bigquery

Additional Context

No response

@sherminsb sherminsb added bug Something isn't working triage labels Aug 5, 2024
@sherminsb sherminsb changed the title [Bug] Python models failed when Google bucket retention policy is set (starting from dbt version 1.7) [Bug] Python model build failed when Google bucket retention policy is set (starting from dbt version 1.7) Aug 5, 2024
@dbeatty10 dbeatty10 changed the title [Bug] Python model build failed when Google bucket retention policy is set (starting from dbt version 1.7) [Regression] Python model build failed when Google bucket retention policy is set (starting from dbt version 1.7) Aug 5, 2024
@dbeatty10 dbeatty10 transferred this issue from dbt-labs/dbt-core Aug 5, 2024
@amychen1776 amychen1776 added python Pull requests that update Python code python_models and removed triage python Pull requests that update Python code labels Aug 27, 2024
@colin-rogers-dbt colin-rogers-dbt self-assigned this Oct 11, 2024
@colin-rogers-dbt
Copy link
Contributor

The root cause here is a change we made in 1.7 to use "indirect" writes instead of writing directly to Bigquery. This allows us to support a wider range of Bigquery functionality (namely writing to partitioned tables).
See BQ connector docs for more info. Note that this means using a bucket retention policy will be incompatible with using partitioned/incremental materialization strategies.

I think the right behavior here is to try and use the direct writes for basic tables materializations and only use indirect if the user sets a partition config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python_models regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants