-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Exclude password-like fields for considering reparse #9844
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #9844 +/- ##
==========================================
- Coverage 88.12% 88.12% -0.01%
==========================================
Files 178 178
Lines 22458 22449 -9
==========================================
- Hits 19792 19783 -9
Misses 2666 2666
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
core/dbt/parser/manifest.py
Outdated
# change to make sure the previous manifest can be loaded correctly. | ||
# This is an example of naming should be chosen based on the functionality | ||
# rather than the implementation details. | ||
connection_keys = config.credentials._connection_keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like credentials._connection_keys()
is just the set of names of credential keys, e.g:
('account', 'user', 'database', 'warehouse', 'role', 'schema', 'authenticator', 'oauth_client_id', 'query_tag', 'client_session_keep_alive', 'host', 'port', 'proxy_host', 'proxy_port', 'protocol', 'connect_retries', 'connect_timeout', 'retry_on_database_errors', 'retry_all', 'insecure_mode', 'reuse_connections')
If we hash just the key names without the values, then partial parsing won't retrigger even if one of the values changes, as it should. We should use list(config.credentials.credential_info())
instead. with_aliases=False
looks appropriate here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! This also mean I probably should add an integration test to capture the behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You beat me to my suggestion on tests 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good! I think we can be doing more testing wise though 🤔 We don't actually seem to be testing what style of parsing is occurring based on change happened in profiles.yml
. I think we need two more tests:
- change something in
profiles.yml
that isn't in_connection_keys
, assert that only a partial parse occurred - change something n
profiles.yml
that is in_connection_keys
, assert that a full parse occurred
tests/unit/test_parse_manifest.py
Outdated
@@ -122,6 +134,17 @@ def test_partial_parse_file_path(self, patched_open, patched_os_exist, patched_s | |||
# if specified in flags, we use the specified path | |||
patched_open.assert_called_with("specified_partial_parse_path", "rb") | |||
|
|||
def test_partial_parse_profile_change(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this test seems to imply that we're asserting that something with partial parsing did or didn't happen. However looking at the test it seems to only assert that the file hash has changed, which is not indicative itself of whether a full or partial parse happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to include the values of the connection_info dictionary in the hash.
# change to make sure the previous manifest can be loaded correctly. | ||
# This is an example of naming should be chosen based on the functionality | ||
# rather than the implementation details. | ||
connection_keys = list(config.credentials.connection_info()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to convert connection_info to a list, that removes the values from the connection_info dictionary. We need to preserve them, because it's changes in the values that force a re-parse. In other places in the code we hash a dictionary for profiles_env_var_hash and and project_env_vars_hash, but any method that turns a dictionary into something hashable would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we don't really need profile_env_vars_hash anymore after doing this since the env_vars would be resolved. So we might want to remove that part, since it will cause extra churn in parts of the profile that may not be being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So connection_info returns a iterator, after converting to a list it will look like [(key1, value1), (key2, value2)].
I will get the profile env_var_hash
part adjusted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming @gshank's comments are addressed, this looks good to me 🙂 Thank for adding the extra test cases!
return { | ||
"type": "postgres", | ||
"threads": 4, | ||
"host": "localhost", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gshank I think the adjust to the test here makes sense since user and pass are not in connection_info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
@MichelleArk @jtcohen6 @graciegoheen I think we mentioned we will be using config to FF when a behavior change happens in order to maintain backward compatibility. Do you think this is considered a behavior change that should actually have a config for? |
@ChenyuLInx Thanks for thinking of it! I had a similar question. Here's my rationale:
So no, I don't think we need a behavior change flag here. This is our implementation detail, not a behavior that we've documented or guaranteed. |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.7.latest 1.7.latest
# Navigate to the new working tree
cd .worktrees/backport-1.7.latest
# Create a new branch
git switch --create backport-9844-to-1.7.latest
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ebc22fa26c9cb26b6deef86cc0a7ceb1ee3fb642
# Push it to GitHub
git push --set-upstream origin backport-9844-to-1.7.latest
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.7.latest Then, create a pull request where the |
(cherry picked from commit ebc22fa)
@ChenyuLInx For single tenants on 1.8, do you think the updated image is available nowish and will deploy during next weeks release? |
@will-sargent-dbtlabs maybe not, I will reach out in slack about more detailed timeline |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.6.latest 1.6.latest
# Navigate to the new working tree
cd .worktrees/backport-1.6.latest
# Create a new branch
git switch --create backport-9844-to-1.6.latest
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ebc22fa26c9cb26b6deef86cc0a7ceb1ee3fb642
# Push it to GitHub
git push --set-upstream origin backport-9844-to-1.6.latest
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.6.latest Then, create a pull request where the |
resolves #9795
related PR dbt-labs/dbt-snowflake#950 has been merged and backported to dbt-snowflake 1.6.latest and 1.7.latest.
Problem
Currently any change in profiles.yml would lead to full reparse.
Solution
Generate the hash for determining rereparse with contents that are accessible in Jinja context instead of the whole profile.yml.
Checklist