Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] TABLE_NOT_FOUND {{tmp_relation}} when there are zero batches to process in incremental model #656

Closed
2 tasks done
antonysouthworth-halter opened this issue May 21, 2024 · 1 comment · May be fixed by #658
Closed
2 tasks done
Labels
bug Something isn't working

Comments

@antonysouthworth-halter
Copy link
Contributor

Is this a new bug in dbt-athena?

  • I believe this is a new bug in dbt-athena
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

https://github.com/dbt-athena/dbt-athena/blob/34633a7f5679344852d1004991fc814ab385dadb/dbt/include/athena/macros/materializations/models/table/create_table_as.sql#L182

If there are zero batches to process, this results in query error due to TABLE_NOT_FOUND later on, here:

https://github.com/dbt-athena/dbt-athena/blob/34633a7f5679344852d1004991fc814ab385dadb/dbt/include/athena/macros/materializations/models/helpers/get_partition_batches.sql#L11

because the relation does not exist.

Expected Behavior

I would expect the model to complete and just load no data.

Steps To Reproduce

You will need:

  • an incremental model, partitioned, with more than 100 partitions, force_batch=True, strategy=append
  • conditions that result in zero data being loaded

e.g. in our case, our model has a clause like

select ...
from ...

where ...
{% if is_incremental() %}
-- don't load data that's already been loaded
and timestamp > (select max(timestamp) from {{this}})

-- don't load data from hours that have not completed, allowing for 30 minutes of lateness
and timestamp < (date_trunc('hour', now() - interval '30' minute))
{% endif %}

So basically, once the model loaded the data for the last completed hour, it will never load data again until at least 30 minutes passed the next hour.

Pretty edge-casey, and we probably could change to insert_overwrite with some effort, but at the same time I don't think the adapter should error out here because there are genuine cases where there might be zero data to load. For example if you are running dbt as part of some other workload and you need to retry the whole thing. Basically it should be idempotent is what I'm saying haha.

Environment

- OS: Darwin 22.6 but we observe the same under Debian running on Fargate in AWS.
- Python: 3.9.19
- dbt: 1.8.0rc2
- dbt-athena-community: 1.8.0rc1

Additional Context

TBH I can probably implement the fix myself, just wanted to ask if there's a specific reason we CTAS on the first batch here rather than CREATE TABLE AS SELECT ... WITH NO DATA before the fore loop and then just INSERT all batches?

@antonysouthworth-halter antonysouthworth-halter added the bug Something isn't working label May 21, 2024
@antonysouthworth-halter antonysouthworth-halter changed the title [Bug] TABLE_NOT_FOUND {{tmp_relation}} when there are zero batches to process [Bug] TABLE_NOT_FOUND {{tmp_relation}} when there are zero batches to process in incremental model May 21, 2024
@antonysouthworth-halter
Copy link
Contributor Author

Closing in favour of #519

@antonysouthworth-halter antonysouthworth-halter closed this as not planned Won't fix, can't repro, duplicate, stale May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant