Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Updating partitioning/clustering on a table self referencing itself will fail and delete the table #1270

Closed
2 tasks done
github-christophe-oudar opened this issue Jun 27, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@github-christophe-oudar
Copy link
Contributor

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

In some cases, models can be designed to self reference itself ({{ this }}) as part of a "sliding window" approach.

Running that kind model AFTER adding a change to clustering/partitioning results in triggering:

{% if not adapter.is_replaceable(old_relation, partition_by, cluster_by) %}
{% do log("Hard refreshing " ~ old_relation ~ " because it is not replaceable") %}
{% do adapter.drop_relation(old_relation) %}
{% endif %}

dbt will drop your current "state" and your query will fail (because {{ this }} won't exist).

If you add an is_incremental backup it might backfire as you would recreate the table "from scratch" without previous state making it a "silent" bug.

It would happen both in table and incremental materialization

Expected Behavior

This code is still relevant as trying to reproduce the root cause would still throw something like:

Cannot replace a table with a different partitioning spec. Instead, DROP the table, and then recreate it. New partitioning spec is clustering(id) and existing spec is none

I would suggest to:

  • create an option to produce an error to let the user deal with the incompatibility
  • create a temp table, delete the current table, create a clone with the right name from the temp table, delete the temp table (note: it would be still tricky for the partitioned table we would need to copy first the whole table in the temp table before running the "usual incremental process")

Steps To Reproduce

A simple example would be to use following model:

WITH base AS (
SELECT website, clicks
FROM {{ this }}
UNION ALL
SELECT website, COUNT(*) clicks
FROM `my_project.my_dataset.click_events`
)
SELECT website, SUM(clicks) clicks
FROM base
GROUP BY website

Run the model.

Then you are adding a config block to cluster_by = ['website'].

Rerun the model.

Relevant log output

No response

Environment

- OS: Mac OS
- Python: 3.11
- dbt-core: 1.8.2
- dbt-bigquery: 1.8.1

Additional Context

That bug is some unexpected "chaos engineering" that's not so easy to figure out

@github-christophe-oudar github-christophe-oudar added bug Something isn't working triage labels Jun 27, 2024
@amychen1776
Copy link

@github-christophe-oudar Could you provide some use cases where you want to have a model self referencing itself? This is by default a pretty anti-pattern to dbt's idempotency.

@amychen1776 amychen1776 added enhancement New feature or request and removed triage bug Something isn't working labels Aug 28, 2024
@github-christophe-oudar
Copy link
Contributor Author

Sure, I've few cases in production where I build self referencing models that are, for instance, daily partitioned tables that contains a 30 day sliding window on which I add new elements and expire old ones. Practically, I read my {{ this }} on previous day partition (where day = DATE_SUB({{ var("date") }}, INTERVAL 1 DAY)) along expiration logic (AND event_ts > DATE_SUB(var("date") }}, INTERVAL 29 DAY)) then it's idempotent to process the same partition as I'm inserting TIMESTAMP({{ var("date") }}) day to insert in the right partition. I'm also able to join reference data that match the day using snapshots for instance.
Creating a single table makes it for downstream consumers (data apps) that just need to read the latest partition (or rollback to the previous one if something is fishy or provide a "time travel" experience).

@amychen1776
Copy link

Thank you for the explanation! That use case makes sense to me (I've done something similar before).That said, I will close this issue for now because we do have a broader project that we are currently working on to be announced close to Coalesce that should resolve this use case more delightfully.

@amychen1776 amychen1776 closed this as not planned Won't fix, can't repro, duplicate, stale Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants