[BUG] Exclude password-like fields for considering reparse #9844

ChenyuLInx · 2024-04-02T20:56:18Z

resolves #9795
related PR dbt-labs/dbt-snowflake#950 has been merged and backported to dbt-snowflake 1.6.latest and 1.7.latest.

Problem

Currently any change in profiles.yml would lead to full reparse.

Solution

Generate the hash for determining rereparse with contents that are accessible in Jinja context instead of the whole profile.yml.

Checklist

I have read the contributing guide and understand what's expected of me
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
This PR includes type annotations for new and modified functions

codecov · 2024-04-02T21:04:23Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.12%. Comparing base (71f3519) to head (f07a86b).
Report is 5 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #9844      +/-   ##
==========================================
- Coverage   88.12%   88.12%   -0.01%     
==========================================
  Files         178      178              
  Lines       22458    22449       -9     
==========================================
- Hits        19792    19783       -9     
  Misses       2666     2666

Flag	Coverage Δ
integration	`85.55% <100.00%> (-0.03%)`	⬇️
unit	`61.89% <100.00%> (+0.10%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

MichelleArk · 2024-04-02T22:41:26Z

core/dbt/parser/manifest.py

+        # change to make sure the previous manifest can be loaded correctly.
+        # This is an example of naming should be chosen based on the functionality
+        # rather than the implementation details.
+        connection_keys = config.credentials._connection_keys()


It looks like credentials._connection_keys() is just the set of names of credential keys, e.g:

('account', 'user', 'database', 'warehouse', 'role', 'schema', 'authenticator', 'oauth_client_id', 'query_tag', 'client_session_keep_alive', 'host', 'port', 'proxy_host', 'proxy_port', 'protocol', 'connect_retries', 'connect_timeout', 'retry_on_database_errors', 'retry_all', 'insecure_mode', 'reuse_connections')

If we hash just the key names without the values, then partial parsing won't retrigger even if one of the values changes, as it should. We should use list(config.credentials.credential_info()) instead. with_aliases=False looks appropriate here as well.

Good catch! This also mean I probably should add an integration test to capture the behavior

You beat me to my suggestion on tests 😂

QMalcolm

The change looks good! I think we can be doing more testing wise though 🤔 We don't actually seem to be testing what style of parsing is occurring based on change happened in profiles.yml. I think we need two more tests:

change something in profiles.yml that isn't in _connection_keys, assert that only a partial parse occurred
change something n profiles.yml that is in _connection_keys, assert that a full parse occurred

QMalcolm · 2024-04-02T22:43:19Z

tests/unit/test_parse_manifest.py

@@ -122,6 +134,17 @@ def test_partial_parse_file_path(self, patched_open, patched_os_exist, patched_s
        # if specified in flags, we use the specified path
        patched_open.assert_called_with("specified_partial_parse_path", "rb")

+    def test_partial_parse_profile_change(self):


The name of this test seems to imply that we're asserting that something with partial parsing did or didn't happen. However looking at the test it seems to only assert that the file hash has changed, which is not indicative itself of whether a full or partial parse happened.

gshank

We need to include the values of the connection_info dictionary in the hash.

gshank · 2024-04-03T12:56:53Z

core/dbt/parser/manifest.py

+        # change to make sure the previous manifest can be loaded correctly.
+        # This is an example of naming should be chosen based on the functionality
+        # rather than the implementation details.
+        connection_keys = list(config.credentials.connection_info())


We don't want to convert connection_info to a list, that removes the values from the connection_info dictionary. We need to preserve them, because it's changes in the values that force a re-parse. In other places in the code we hash a dictionary for profiles_env_var_hash and and project_env_vars_hash, but any method that turns a dictionary into something hashable would work.

Also, we don't really need profile_env_vars_hash anymore after doing this since the env_vars would be resolved. So we might want to remove that part, since it will cause extra churn in parts of the profile that may not be being used.

So connection_info returns a iterator, after converting to a list it will look like [(key1, value1), (key2, value2)].
I will get the profile env_var_hash part adjusted

QMalcolm

Assuming @gshank's comments are addressed, this looks good to me 🙂 Thank for adding the extra test cases!

ChenyuLInx · 2024-04-03T23:00:20Z

tests/functional/partial_parsing/test_pp_vars.py

        return {
            "type": "postgres",
            "threads": 4,
-            "host": "localhost",


@gshank I think the adjust to the test here makes sense since user and pass are not in connection_info

gshank

Looks good.

ChenyuLInx · 2024-04-04T19:41:49Z

@MichelleArk @jtcohen6 @graciegoheen I think we mentioned we will be using config to FF when a behavior change happens in order to maintain backward compatibility. Do you think this is considered a behavior change that should actually have a config for?
My guess is not, but it is a behavior change(small). We should track it so we have a deciding criteria for later.

jtcohen6 · 2024-04-04T20:17:52Z

@ChenyuLInx Thanks for thinking of it! I had a similar question. Here's my rationale:

The behavior we're deprecating (you can avoid a full re-parse by setting a non-credential profile value to a "secret" env var) has never been documented (Document intersection of "secret" env vars + partial parsing docs.getdbt.com#1066).
It's not a behavior change that risks correctness, just means that there will be full parses instead of partial parses
I'm aware of other workarounds... and we agree that the real answer is to be smarter with our detection of where target values, --vars, etc, are being used in the project.

So no, I don't think we need a behavior change flag here. This is our implementation detail, not a behavior that we've documented or guaranteed.

github-actions · 2024-04-04T22:41:26Z

The backport to 1.7.latest failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.7.latest 1.7.latest
# Navigate to the new working tree
cd .worktrees/backport-1.7.latest
# Create a new branch
git switch --create backport-9844-to-1.7.latest
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ebc22fa26c9cb26b6deef86cc0a7ceb1ee3fb642
# Push it to GitHub
git push --set-upstream origin backport-9844-to-1.7.latest
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.7.latest

Then, create a pull request where the base branch is 1.7.latest and the compare/head branch is backport-9844-to-1.7.latest.

(cherry picked from commit ebc22fa)

will-sargent-dbtlabs · 2024-04-04T23:41:10Z

@ChenyuLInx For single tenants on 1.8, do you think the updated image is available nowish and will deploy during next weeks release?

ChenyuLInx · 2024-04-05T00:49:48Z

@will-sargent-dbtlabs maybe not, I will reach out in slack about more detailed timeline

github-actions · 2024-04-08T18:12:32Z

The backport to 1.6.latest failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.6.latest 1.6.latest
# Navigate to the new working tree
cd .worktrees/backport-1.6.latest
# Create a new branch
git switch --create backport-9844-to-1.6.latest
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ebc22fa26c9cb26b6deef86cc0a7ceb1ee3fb642
# Push it to GitHub
git push --set-upstream origin backport-9844-to-1.6.latest
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.6.latest

Then, create a pull request where the base branch is 1.6.latest and the compare/head branch is backport-9844-to-1.6.latest.

…9844) (#9857)

ChenyuLInx added 2 commits April 2, 2024 13:51

hash connection_keys as profile

2f78af0

changlog

11fc733

ChenyuLInx requested a review from a team as a code owner April 2, 2024 20:56

cla-bot bot added the cla:yes label Apr 2, 2024

ChenyuLInx added 2 commits April 2, 2024 13:59

nits

42bbe7a

nits

e44b328

ChenyuLInx requested review from gshank and QMalcolm April 2, 2024 22:09

MichelleArk reviewed Apr 2, 2024

View reviewed changes

QMalcolm reviewed Apr 2, 2024

View reviewed changes

ChenyuLInx added 2 commits April 2, 2024 17:08

adjust

3ce11ef

adjust

c3f4389

gshank requested changes Apr 3, 2024

View reviewed changes

ChenyuLInx requested review from MichelleArk and QMalcolm April 3, 2024 15:23

QMalcolm approved these changes Apr 3, 2024

View reviewed changes

adjust_vars

1055202

ChenyuLInx requested a review from gshank April 3, 2024 22:59

ChenyuLInx commented Apr 3, 2024

View reviewed changes

nits

f07a86b

gshank approved these changes Apr 4, 2024

View reviewed changes

ChenyuLInx merged commit ebc22fa into main Apr 4, 2024
62 checks passed

ChenyuLInx deleted the cl/hash_profile branch April 4, 2024 21:34

ChenyuLInx added the backport 1.7.latest label Apr 4, 2024

ChenyuLInx added a commit that referenced this pull request Apr 4, 2024

[BUG] Exclude password-like fields for considering reparse (#9844)

edac4db

(cherry picked from commit ebc22fa)

ChenyuLInx added a commit that referenced this pull request Apr 4, 2024

[BUG] Exclude password-like fields for considering reparse (#9844)

9657a75

ChenyuLInx mentioned this pull request Apr 4, 2024

[BACKPORT 1.7] Exclude password-like fields for considering reparse (#9844) #9857

Merged

5 tasks

ChenyuLInx added the backport 1.6.latest label Apr 8, 2024

ChenyuLInx mentioned this pull request Apr 8, 2024

[BACKPORT 1.6] Exclude password-like fields for considering reparse #9879

Merged

5 tasks

ChenyuLInx added a commit that referenced this pull request Apr 8, 2024

[BACKPORT 1.7] Exclude password-like fields for considering reparse (#…

947f397

…9844) (#9857)

fredriv mentioned this pull request Aug 15, 2024

[Regression] Partial parsing is disabled by "secret" env vars in profile connection info #10571

Closed

2 tasks

This was referenced Aug 20, 2024

[Feature] Extend partial parsing behavior for env vars to {{ target }} also #10579

Open

[Feature] Extend partial parsing behavior for env vars to project vars also #10578

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Exclude password-like fields for considering reparse #9844

[BUG] Exclude password-like fields for considering reparse #9844

ChenyuLInx commented Apr 2, 2024 •

edited

Loading

codecov bot commented Apr 2, 2024 •

edited

Loading

MichelleArk Apr 2, 2024 •

edited

Loading

ChenyuLInx Apr 2, 2024

QMalcolm Apr 2, 2024 •

edited

Loading

QMalcolm left a comment

QMalcolm Apr 2, 2024

gshank left a comment

gshank Apr 3, 2024

gshank Apr 3, 2024

ChenyuLInx Apr 3, 2024

QMalcolm left a comment

ChenyuLInx Apr 3, 2024

gshank left a comment

ChenyuLInx commented Apr 4, 2024 •

edited

Loading

jtcohen6 commented Apr 4, 2024

github-actions bot commented Apr 4, 2024

will-sargent-dbtlabs commented Apr 4, 2024

ChenyuLInx commented Apr 5, 2024

github-actions bot commented Apr 8, 2024

[BUG] Exclude password-like fields for considering reparse #9844

[BUG] Exclude password-like fields for considering reparse #9844

Conversation

ChenyuLInx commented Apr 2, 2024 • edited Loading

Problem

Solution

Checklist

codecov bot commented Apr 2, 2024 • edited Loading

Codecov Report

MichelleArk Apr 2, 2024 • edited Loading

Choose a reason for hiding this comment

ChenyuLInx Apr 2, 2024

Choose a reason for hiding this comment

QMalcolm Apr 2, 2024 • edited Loading

Choose a reason for hiding this comment

QMalcolm left a comment

Choose a reason for hiding this comment

QMalcolm Apr 2, 2024

Choose a reason for hiding this comment

gshank left a comment

Choose a reason for hiding this comment

gshank Apr 3, 2024

Choose a reason for hiding this comment

gshank Apr 3, 2024

Choose a reason for hiding this comment

ChenyuLInx Apr 3, 2024

Choose a reason for hiding this comment

QMalcolm left a comment

Choose a reason for hiding this comment

ChenyuLInx Apr 3, 2024

Choose a reason for hiding this comment

gshank left a comment

Choose a reason for hiding this comment

ChenyuLInx commented Apr 4, 2024 • edited Loading

jtcohen6 commented Apr 4, 2024

github-actions bot commented Apr 4, 2024

will-sargent-dbtlabs commented Apr 4, 2024

ChenyuLInx commented Apr 5, 2024

github-actions bot commented Apr 8, 2024

ChenyuLInx commented Apr 2, 2024 •

edited

Loading

codecov bot commented Apr 2, 2024 •

edited

Loading

MichelleArk Apr 2, 2024 •

edited

Loading

QMalcolm Apr 2, 2024 •

edited

Loading

ChenyuLInx commented Apr 4, 2024 •

edited

Loading