-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implements pagination #811
Conversation
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: matt-winkler.
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: matt-winkler.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a duplicate changie, can this be removed?
{% do exceptions.raise_compiler_error(msg) %} | ||
{% endif %} | ||
{{ return(result) }} | ||
{% macro snowflake__get_paginated_schemas_array(max_iter, max_results_per_iter, max_total_results, database, watermark) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code looks familiar, did we use pagination for a different query recently? If so, is there any way to reuse that, or to augment that so that both use cases can use it? The only piece that appears to be use case specific is paginated_sql
and the error message. The former could be an argument to this macro and the latter could probably be made more generic (e.g. swap schemas
for objects
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, you are remembering correctly @mikealfare !
The solution is similar to that within #572, but for paginating the listing of schemas instead of paginating the listing of relations.
|
||
def test__snowflake__list_schemas_termination(self, project): | ||
""" | ||
validates that we do NOT trigger pagination logic snowflake__list_relations_without_caching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear on how this validates that the pagination logic is not triggered. I agree that it shouldn't since we're allowing for 200 schemas per result (line 127) but only creating 100 schemas (line 125 and line 15). However, we're only checking for the correct number of schemas at the end, which I would think is the same regardless of whether pagination was used.
} | ||
|
||
def test__snowflake__list_schemas(self, project): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think providing the arguments max_iter:1
and max_results_per_iter:200
results in different behavior than the default (10 and 1000 respectively) since we're only creating 100 test schemas. If that's the case, is this the same test case as TestListSchemasSingle.test__snowflake__list_schemas_termination
?
validates pagination logic terminates and raises a compilation error | ||
when exceeding the limit of how many results to return. | ||
""" | ||
run_dbt(["run"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there are no models or tests, does this do anything here? It doesn't appear to be needed in the other tests. Also, if this does something, it potentially makes the test run order-dependent. If the first method in this class runs first, then it's run before running run_dbt(["run"])
. If the second method in this class runs first, then it's run after running run_dbt(["run"])
. This creates some difficult to find test failures. We just resolved something like this prior to releasing 1.7.0rc1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that you considered positive and negative test cases that cover a few scenarios. You structured the test cases very similarly. I think you could take this one step further and make it a single parameterized test case, making it easy to see the three scenarios (e.g. if <scenario 1> then <expected outcome 1>, etc.). If parameterized tests are a new thing, let me know if you want to pair on it (or if you just want to pair on it).
@mikealfare Thanks for the review on this. Haven't forgotten about it but tied up with quarter review items but plan to pick up again soon. |
This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days. |
Although we are closing this PR as stale, it can still be reopened to continue development. Just add a comment to notify the maintainers. |
@matt-winkler we can re-open this any time you want. |
resolves #810
Problem
Currently, dbt snowflake requests ALL schemas in a target database via the
show terse objects
command. This can create a bottleneck for some networks when the return payload forces snowflake to return data via internal stages.Solution
This PR includes functionality that paginates the requests to list schemas in the target db.
The solution is similar to that within #572, but for paginating the listing of schemas instead of paginating the listing of relations.
NOTE: what performance concerns do we have with this?
Checklist