Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implements pagination #811

Closed
wants to merge 4 commits into from
Closed

Conversation

matt-winkler
Copy link
Contributor

@matt-winkler matt-winkler commented Oct 18, 2023

resolves #810

Problem

Currently, dbt snowflake requests ALL schemas in a target database via the show terse objects command. This can create a bottleneck for some networks when the return payload forces snowflake to return data via internal stages.

Solution

This PR includes functionality that paginates the requests to list schemas in the target db.

The solution is similar to that within #572, but for paginating the listing of schemas instead of paginating the listing of relations.

NOTE: what performance concerns do we have with this?

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

@cla-bot
Copy link

cla-bot bot commented Oct 18, 2023

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: matt-winkler.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@dbeatty10 dbeatty10 added the ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering label Oct 19, 2023
@cla-bot
Copy link

cla-bot bot commented Oct 19, 2023

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: matt-winkler.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot cla-bot bot added the cla:yes label Oct 19, 2023
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a duplicate changie, can this be removed?

{% do exceptions.raise_compiler_error(msg) %}
{% endif %}
{{ return(result) }}
{% macro snowflake__get_paginated_schemas_array(max_iter, max_results_per_iter, max_total_results, database, watermark) %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks familiar, did we use pagination for a different query recently? If so, is there any way to reuse that, or to augment that so that both use cases can use it? The only piece that appears to be use case specific is paginated_sql and the error message. The former could be an argument to this macro and the latter could probably be made more generic (e.g. swap schemas for objects).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you are remembering correctly @mikealfare !

The solution is similar to that within #572, but for paginating the listing of schemas instead of paginating the listing of relations.


def test__snowflake__list_schemas_termination(self, project):
"""
validates that we do NOT trigger pagination logic snowflake__list_relations_without_caching
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on how this validates that the pagination logic is not triggered. I agree that it shouldn't since we're allowing for 200 schemas per result (line 127) but only creating 100 schemas (line 125 and line 15). However, we're only checking for the correct number of schemas at the end, which I would think is the same regardless of whether pagination was used.

}

def test__snowflake__list_schemas(self, project):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think providing the arguments max_iter:1 and max_results_per_iter:200 results in different behavior than the default (10 and 1000 respectively) since we're only creating 100 test schemas. If that's the case, is this the same test case as TestListSchemasSingle.test__snowflake__list_schemas_termination?

validates pagination logic terminates and raises a compilation error
when exceeding the limit of how many results to return.
"""
run_dbt(["run"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are no models or tests, does this do anything here? It doesn't appear to be needed in the other tests. Also, if this does something, it potentially makes the test run order-dependent. If the first method in this class runs first, then it's run before running run_dbt(["run"]). If the second method in this class runs first, then it's run after running run_dbt(["run"]). This creates some difficult to find test failures. We just resolved something like this prior to releasing 1.7.0rc1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that you considered positive and negative test cases that cover a few scenarios. You structured the test cases very similarly. I think you could take this one step further and make it a single parameterized test case, making it easy to see the three scenarios (e.g. if <scenario 1> then <expected outcome 1>, etc.). If parameterized tests are a new thing, let me know if you want to pair on it (or if you just want to pair on it).

@matt-winkler
Copy link
Contributor Author

@mikealfare Thanks for the review on this. Haven't forgotten about it but tied up with quarter review items but plan to pick up again soon.

Copy link
Contributor

github-actions bot commented May 5, 2024

This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label May 5, 2024
Copy link
Contributor

Although we are closing this PR as stale, it can still be reopened to continue development. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this May 12, 2024
@dbeatty10
Copy link
Contributor

@matt-winkler we can re-open this any time you want.

@mikealfare mikealfare deleted the feature/paginate-schemas-in-database branch July 17, 2024 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes ok to test ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ADAP-951] [Feature] dbt-snowflake should handle paginating on schemas and tables
4 participants