You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt-snowflake functionality, rather than a Big Idea better suited to a discussion
Describe the feature
Currently, the delete+insert incremental strategy requires a "unique_key", but there's nothing actually enforcing this key be unique. I've done this before, but the issue is that it creates a cartesian join between all the records with matching (non-unique) unique_key values because of the USING. The new microbatch strategy does something similar, where if you don't have an additional predicate, the USING
Describe alternatives you've considered
For the delete+insert strategy, I would propose possibly changing the name the "unique_key" config to "incremental_key", and modifying the query to use a correlated EXISTS subquery.
deletefrom {{ target }}
where exists (
select1from {{ source }}
where
{% if incremental_key is sequence and incremental_key is not string %}
{% for key in incremental_key %}
{{ source }}.{{ key }} = {{ target }}.{{ key }}
{{ "and " if not loop.last}}
{% endfor %}
{% else %}
{{ source }}.{{ incremental_key }} = {{ target}}.{{ incremental_key }}
{% endif %}
)
{% if incremental_predicates %}
{% for predicate in incremental_predicates %}
{{ predicate }} {{ "and" if not loop.last }}
{% endfor %}
{% endif %};
The using {{ source }} line in the microbatch strategy should simply be deleted.
Who will this benefit?
This would benefit every who uses these incremental strategies by significantly lowering warehouse load (and therefore costs) for these types of queries.
Are you interested in contributing this feature?
I think the code above should work. It may take me a while before I have time to do all the requisite testing before creating a PR.
Anything else?
I ran one query like DELETE FROM target USING source with no additional predicates where the target table had 10 million records and the source table a billion. You can see how relatively expensive that cartesian join is.
Most Expensive Nodes
CartesianJoin 47.3%
Delete 27.0%
TableScan 1.3%
The text was updated successfully, but these errors were encountered:
Is this your first time submitting a feature request?
Describe the feature
Currently, the delete+insert incremental strategy requires a "unique_key", but there's nothing actually enforcing this key be unique. I've done this before, but the issue is that it creates a cartesian join between all the records with matching (non-unique) unique_key values because of the USING. The new microbatch strategy does something similar, where if you don't have an additional predicate, the USING
Describe alternatives you've considered
For the delete+insert strategy, I would propose possibly changing the name the "unique_key" config to "incremental_key", and modifying the query to use a correlated EXISTS subquery.
The
using {{ source }}
line in the microbatch strategy should simply be deleted.Who will this benefit?
This would benefit every who uses these incremental strategies by significantly lowering warehouse load (and therefore costs) for these types of queries.
Are you interested in contributing this feature?
I think the code above should work. It may take me a while before I have time to do all the requisite testing before creating a PR.
Anything else?
I ran one query like
DELETE FROM target USING source
with no additional predicates where the target table had 10 million records and the source table a billion. You can see how relatively expensive that cartesian join is.Most Expensive Nodes
CartesianJoin 47.3%
Delete 27.0%
TableScan 1.3%
The text was updated successfully, but these errors were encountered: