Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Microbatch should support individual lookback windows on both sides of the batch #10899

Open
3 tasks done
siljamardla opened this issue Oct 22, 2024 · 0 comments
Open
3 tasks done
Labels
enhancement New feature or request triage

Comments

@siljamardla
Copy link

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Here's a typical use case for a dbt model:

SELECT 
...
FROM table1
JOIN table2 ON ...
JOIN table3 ON ...
WHERE 1=1
 --defining the batch
AND table1.event_date >= date('2024-01-01') AND table1.event_date < date('2024-01-02')
--filtering a dependent table to read enough data to cover the batch (events in table2 can occur +/-1 day from events in table1)
AND table2.event_date >= date('2024-01-01') - interval 1 day AND table2.event_date < date('2024-01-02') + interval 1 day
--filtering a dependent table to read enough data to cover the batch (events in table3 can occur + 2 days from events in table1)
AND table3.event_date >= date('2024-01-01') AND table3.event_date < date('2024-01-02') + interval 2 day

If I understand the current implementation correctly, the microbatch lookback parameter will let me define one lookback value (in units of batches) that will filter data from table1, table2 and table3, only on the "left" side, i.e. before the batch.

In practice, there will always be records that are tricky on the edges (e.g. fall into the next day by 1 second). Therefore it's especially important to have the buffer on both sides of the batch. The ability to configure the buffer table by table is a performance gain, especially if there are multiple tables that require a small buffer and one table that requires a very large buffer.

My example would be processing order events and reading the order table, where most orders are recent, but some orders are scheduled orders, created up to 90 days before any events happen.

Describe alternatives you've considered

No response

Who will this benefit?

Users of the microbatch incremental loading logic.

Are you interested in contributing this feature?

No response

Anything else?

#10640 might be describing the same request, but I'm not sure

@siljamardla siljamardla added enhancement New feature or request triage labels Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage
Projects
None yet
Development

No branches or pull requests

1 participant