Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry factory to consolidate retry strategies across dbt-bigquery #1395

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

mikealfare
Copy link
Contributor

Problem

We handle retries in different ways across different classes and layers of the application. We regularly get bug reports related to retries not being properly handled.

Solution

Dedicate a class specifically to building retry strategies and remove this logic from the execution code.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

@mikealfare mikealfare self-assigned this Nov 5, 2024
@cla-bot cla-bot bot added the cla:yes label Nov 5, 2024
Copy link
Contributor

github-actions bot commented Nov 5, 2024

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-bigquery contributing guide.

Copy link
Contributor Author

@mikealfare mikealfare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left comments on each file to summarize my changes and provide some context for why I made the changes. Hopefully this helps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These client methods used to live in BigQueryConnectionsManager and python_submissions. Centralizing them here reduced the interface for credentials and removed noise from those other classes, making troubleshooting easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods used to live in BigQueryConnectionManager. The were moved into the credentials module as part of creating the clients module. This reduces the credentials interface to:

  • DataprocBatchConfig: an input type on BigQueryCredentials
  • BigQueryCredentials: the primary credentials object
  • setup_default_credentials: what it sounds like
  • google_credentials: a factory that turns BigQueryCredentials into a google Credentials object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was split up and either moved into the clients module or the python_submissions module. After simplifying retries and consolidating clients, there wasn't much left here. It was indirection with no real benefit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two methods (load_dataframe and upload_file) here that really belong at a lower level. I left the methods for backwards compatibility, but the majority of the logic was moved into BigQueryConnectionsManager. poll_until_job_completes is replaced by calling .result on the job that load_table_from_file returns; it is no longer needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the crux of the PR. RetryFactory is generated from the same input as BigQueryConnectionsManager, but it represents a separate of concerns. The attributes that are used are really profile attributes, and not credential attributes. This factory collects settings that were previously scattered around dbt-bigquery and exposes them either as timeouts or google Retry objects. In many cases we were mimicking built-in features of the google SDK. This is evident by the fact that we can slightly modify DEFAULT_RETRY and DEFAULT_POLLING to get analogous retry strategies. In the case of BigQueryConnectionsManager._retry_and_handle, we can use a custom Retry class instead of calling retry_target directly. This is now reopen_with_deadline. It's also worth noting that RetryFactory is the only public member of this module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes reflect moving things around in the functional code as well as properly mocking objects that appeared to be incorrectly mocked. This wasn't identified until imports were fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes reflect objects moving around in the functional code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_retry_and_handle is no longer a method. It was replaced by the retry factory. The way that we reopen on error changed slightly, so the test had to as well. Some mocks needed to be updated, and again it seemed like some were just not mocking the correct object. The majority of the changes here reflect moving functionality from BigQueryConnectionManager into RetryFactory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are groups of changes here:

  • credentials-related things moved to the credentials module
  • retry-related things moved to the retry module
  • _retry_and_handle was replaced with the new RetryFactory.reopen_with_deadline
  • exception handling for the bigquery client factory method was moved into the factory method
  • the four methods getting timeouts were just removed, they're no longer needed
  • load_dataframe and upload_file were moved here from the adapter, and then called with wrapper methods in the adapter
  • imports were fixed and made explicit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • _update_batch_from_config was moved from dataproc.batch to here
  • client factories were moved to the clients module
  • polling methods were replaced by calling .result with a polling retry strategy
  • instance attributes set during instantiation were rationalized and minimized to make troubleshooting easier
  • the abstract methods were combined into submit to remove noise

It's worth noting this no longer depends on the connection manager, and instead on credentials, clients, and retry.

@mikealfare mikealfare marked this pull request as ready for review November 7, 2024 20:33
@mikealfare mikealfare requested a review from a team as a code owner November 7, 2024 20:33
@mikealfare mikealfare mentioned this pull request Nov 9, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant