Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Exclude password-like fields for considering reparse #9844

Merged
merged 8 commits into from
Apr 4, 2024

Conversation

ChenyuLInx
Copy link
Contributor

@ChenyuLInx ChenyuLInx commented Apr 2, 2024

resolves #9795
related PR dbt-labs/dbt-snowflake#950 has been merged and backported to dbt-snowflake 1.6.latest and 1.7.latest.

Problem

Currently any change in profiles.yml would lead to full reparse.

Solution

Generate the hash for determining rereparse with contents that are accessible in Jinja context instead of the whole profile.yml.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
  • This PR includes type annotations for new and modified functions

@ChenyuLInx ChenyuLInx requested a review from a team as a code owner April 2, 2024 20:56
@cla-bot cla-bot bot added the cla:yes label Apr 2, 2024
Copy link

codecov bot commented Apr 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.12%. Comparing base (71f3519) to head (f07a86b).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9844      +/-   ##
==========================================
- Coverage   88.12%   88.12%   -0.01%     
==========================================
  Files         178      178              
  Lines       22458    22449       -9     
==========================================
- Hits        19792    19783       -9     
  Misses       2666     2666              
Flag Coverage Δ
integration 85.55% <100.00%> (-0.03%) ⬇️
unit 61.89% <100.00%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

# change to make sure the previous manifest can be loaded correctly.
# This is an example of naming should be chosen based on the functionality
# rather than the implementation details.
connection_keys = config.credentials._connection_keys()
Copy link
Contributor

@MichelleArk MichelleArk Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like credentials._connection_keys() is just the set of names of credential keys, e.g:

('account', 'user', 'database', 'warehouse', 'role', 'schema', 'authenticator', 'oauth_client_id', 'query_tag', 'client_session_keep_alive', 'host', 'port', 'proxy_host', 'proxy_port', 'protocol', 'connect_retries', 'connect_timeout', 'retry_on_database_errors', 'retry_all', 'insecure_mode', 'reuse_connections')

If we hash just the key names without the values, then partial parsing won't retrigger even if one of the values changes, as it should. We should use list(config.credentials.credential_info()) instead. with_aliases=False looks appropriate here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This also mean I probably should add an integration test to capture the behavior

Copy link
Contributor

@QMalcolm QMalcolm Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You beat me to my suggestion on tests 😂

Copy link
Contributor

@QMalcolm QMalcolm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good! I think we can be doing more testing wise though 🤔 We don't actually seem to be testing what style of parsing is occurring based on change happened in profiles.yml. I think we need two more tests:

  • change something in profiles.yml that isn't in _connection_keys, assert that only a partial parse occurred
  • change something n profiles.yml that is in _connection_keys, assert that a full parse occurred

@@ -122,6 +134,17 @@ def test_partial_parse_file_path(self, patched_open, patched_os_exist, patched_s
# if specified in flags, we use the specified path
patched_open.assert_called_with("specified_partial_parse_path", "rb")

def test_partial_parse_profile_change(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this test seems to imply that we're asserting that something with partial parsing did or didn't happen. However looking at the test it seems to only assert that the file hash has changed, which is not indicative itself of whether a full or partial parse happened.

Copy link
Contributor

@gshank gshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to include the values of the connection_info dictionary in the hash.

# change to make sure the previous manifest can be loaded correctly.
# This is an example of naming should be chosen based on the functionality
# rather than the implementation details.
connection_keys = list(config.credentials.connection_info())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to convert connection_info to a list, that removes the values from the connection_info dictionary. We need to preserve them, because it's changes in the values that force a re-parse. In other places in the code we hash a dictionary for profiles_env_var_hash and and project_env_vars_hash, but any method that turns a dictionary into something hashable would work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we don't really need profile_env_vars_hash anymore after doing this since the env_vars would be resolved. So we might want to remove that part, since it will cause extra churn in parts of the profile that may not be being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So connection_info returns a iterator, after converting to a list it will look like [(key1, value1), (key2, value2)].
I will get the profile env_var_hash part adjusted

Copy link
Contributor

@QMalcolm QMalcolm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming @gshank's comments are addressed, this looks good to me 🙂 Thank for adding the extra test cases!

@ChenyuLInx ChenyuLInx requested a review from gshank April 3, 2024 22:59
return {
"type": "postgres",
"threads": 4,
"host": "localhost",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gshank I think the adjust to the test here makes sense since user and pass are not in connection_info

Copy link
Contributor

@gshank gshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@ChenyuLInx
Copy link
Contributor Author

ChenyuLInx commented Apr 4, 2024

@MichelleArk @jtcohen6 @graciegoheen I think we mentioned we will be using config to FF when a behavior change happens in order to maintain backward compatibility. Do you think this is considered a behavior change that should actually have a config for?
My guess is not, but it is a behavior change(small). We should track it so we have a deciding criteria for later.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Apr 4, 2024

@ChenyuLInx Thanks for thinking of it! I had a similar question. Here's my rationale:

  • The behavior we're deprecating (you can avoid a full re-parse by setting a non-credential profile value to a "secret" env var) has never been documented (Document intersection of "secret" env vars + partial parsing docs.getdbt.com#1066).
  • It's not a behavior change that risks correctness, just means that there will be full parses instead of partial parses
  • I'm aware of other workarounds... and we agree that the real answer is to be smarter with our detection of where target values, --vars, etc, are being used in the project.

So no, I don't think we need a behavior change flag here. This is our implementation detail, not a behavior that we've documented or guaranteed.

@ChenyuLInx ChenyuLInx merged commit ebc22fa into main Apr 4, 2024
62 checks passed
@ChenyuLInx ChenyuLInx deleted the cl/hash_profile branch April 4, 2024 21:34
Copy link
Contributor

github-actions bot commented Apr 4, 2024

The backport to 1.7.latest failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.7.latest 1.7.latest
# Navigate to the new working tree
cd .worktrees/backport-1.7.latest
# Create a new branch
git switch --create backport-9844-to-1.7.latest
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ebc22fa26c9cb26b6deef86cc0a7ceb1ee3fb642
# Push it to GitHub
git push --set-upstream origin backport-9844-to-1.7.latest
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.7.latest

Then, create a pull request where the base branch is 1.7.latest and the compare/head branch is backport-9844-to-1.7.latest.

ChenyuLInx added a commit that referenced this pull request Apr 4, 2024
@will-sargent-dbtlabs
Copy link

@ChenyuLInx For single tenants on 1.8, do you think the updated image is available nowish and will deploy during next weeks release?

@ChenyuLInx
Copy link
Contributor Author

@will-sargent-dbtlabs maybe not, I will reach out in slack about more detailed timeline

Copy link
Contributor

github-actions bot commented Apr 8, 2024

The backport to 1.6.latest failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.6.latest 1.6.latest
# Navigate to the new working tree
cd .worktrees/backport-1.6.latest
# Create a new branch
git switch --create backport-9844-to-1.6.latest
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ebc22fa26c9cb26b6deef86cc0a7ceb1ee3fb642
# Push it to GitHub
git push --set-upstream origin backport-9844-to-1.6.latest
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.6.latest

Then, create a pull request where the base branch is 1.6.latest and the compare/head branch is backport-9844-to-1.6.latest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Exclude password-like fields (missing from _connection_keys) in partial parsing state check
6 participants