Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Python incremental models hang on invocation, never submit to dataproc #1157

Open
2 tasks done
nickozilla opened this issue Mar 28, 2024 · 3 comments
Open
2 tasks done
Labels
bug Something isn't working needs_spike python_models

Comments

@nickozilla
Copy link
Contributor

nickozilla commented Mar 28, 2024

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

We've noticed lately that when a python incremental model is run in our CICD pipeline, sometimes it will hang indefinitely & never submit a job to dataproc. We haven't been able to identify why this happens in some invocations, but not others & this seems to be unique to incremental python models.

Expected Behavior

  1. Run the python incremental model
  2. See in logs: BigQuery adapter: Submitting batch job with id: ...
  3. See in logs: created python incremental model ...

Steps To Reproduce

When running dbt run --target=unit-test --exclude tag:unit-test-new with a python incremental model caught in the result, the job will never be submitted & instead hang in invocation indefinitely.

Relevant log output

No response

Environment

- OS: debian:11-slim
- Python: 3.11
- dbt-core: 1.7.10
- dbt-bigquery: 1.7.6

Additional Context

The failing model also has these config properties

  materialized: incremental
  incremental_strategy: merge
@nickozilla nickozilla added bug Something isn't working triage labels Mar 28, 2024
@dlubawy
Copy link

dlubawy commented Apr 17, 2024

We have started seeing this too ever since we got Python models working again with the latest regression fixes (batch ID and nested data structures). However, our problem is not just limited to incremental models. Like @nickozilla mentioned, dbt reaches the Python model in the execution stream and then just hangs without ever submitting a Dataproc job. We have tried to debug this further by turning on the debug flag, but this does not display anything useful. It will just reach the Python models, output the Python code in log output, and then the process hangs indefinitely until killed.

We have tried setting the various timeout configuration options to at least timeout these specific jobs and noticed this has no impact either. The dbt process will always hang here which suggests that there is no mechanism in place to gracefully handle Dataproc failures, and/or the Dataproc code itself is blocking the main dbt process from continuing when submitting Dataproc jobs.

Additional info:

  • python: 3.10.7
  • dbt-core: 1.7.11
  • dbt-bigquery: 1.7.7
  • Dataproc environment: serverless batch jobs

@HotDiggityDogz
Copy link

For us, the job submits but if there is an error in the job (e.g. in my case there was a JSON column and apparently that's not supported for writing to BQ), then dbt hangs and does not continue afterwards.

Not great for production, where we want it to behave like a normal failure and continue so that the post-run alerting will run.

@amychen1776
Copy link

Hi folks! I was curious when you started to noticed this? I'm curious if there was actually a change on the Datapoc side rather than dbt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs_spike python_models
Projects
None yet
Development

No branches or pull requests

6 participants