Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Defaulting reuse_connections to True is breaking some users jobs #1167

Open
2 tasks done
jeremyyeo opened this issue Aug 13, 2024 · 1 comment
Open
2 tasks done
Labels
awaiting_response bug Something isn't working

Comments

@jeremyyeo
Copy link
Contributor

jeremyyeo commented Aug 13, 2024

Is this a new bug in dbt-snowflake?

  • I believe this is a new bug in dbt-snowflake
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

We made connection reuse the default in #1109 - this is breaking some users jobs when they depend on thread session isolation. We should make reuse_connections opt-in instead of opt-out.

Expected Behavior

Make this behave the way they were prior to making reuse_connections: True by default.

Steps To Reproduce

First, let's create a UDF to return the current session timezone - we will be using this in our model code later:

CREATE OR REPLACE FUNCTION development_jyeo.dbt_jyeo.get_current_timezone()
RETURNS VARCHAR
LANGUAGE JAVASCRIPT
AS 
$$
  const timezone = Intl.DateTimeFormat().resolvedOptions().timeZone;
  return timezone;
$$
;

select development_jyeo.dbt_jyeo.get_current_timezone();
-- America/Los_Angeles

Make sure the account level timezone is America/Los_Angeles (Snowflakes default):

image

Let's setup our profiles.yml:

# .dbt/profiles.yml
sf:
  target: dev
  outputs:
    dev: 
      type: snowflake
      account: ...
      user: ...
      password: ...
      role: transformer
      database: development_jyeo
      schema: dbt_jyeo
      warehouse: analytics

^ Key here being that neither of these keys client_session_keep_alive, reuse_connections are set by the end user (dbt Cloud default profile).

Now, let's setup our dbt project.

# dbt_project.yml
name: my_dbt_project
profile: sf
config-version: 2
version: "1.0.0"

models:
   my_dbt_project:
      +materialized: table
      +pre-hook: "{{ check_tz() }}"
      +post-hook: "{{ check_tz() }}"

on-run-start: "{{ check_tz() }}"
on-run-end: "{{ check_tz() }}"

-- models/schema.yml
models:
  - name: a
    columns:
      - name: ctz
        data_tests:
          - accepted_values:
              values: ['Pacific/Auckland']
  - name: b
    columns:
      - name: ctz
        data_tests:
          - accepted_values:
              values: ['America/Los_Angeles']
-- macros/check_tz.sql
{% macro check_tz() %}
    {% set query %}
        SHOW PARAMETERS LIKE 'TIMEZONE' IN SESSION;
    {% endset %}
    {% if execute %}
        {% set results = run_query(query).columns[1].values()[0] %}
        {% do print('=========================================') %}
        {% do print('SESSION TIMEZONE IS: ' ~ results) %}
        {% do print('=========================================') %}
    {% endif %}
{% endmacro %}

-- models/a.sql
{{ config(sql_header="ALTER SESSION SET TIMEZONE = 'Pacific/Auckland';") }}
select current_timestamp()::text as c, development_jyeo.dbt_jyeo.get_current_timezone() as ctz

-- models/b.sql
select current_timestamp()::text as c, development_jyeo.dbt_jyeo.get_current_timezone() as ctz

The idea here is that only model a.sql should operate on timezone Pacific/Auckland - and model b.sql should follow the account timezone America/Los_Angeles.

First we build using 1.8.latest:

$ dbt build --threads 1
00:36:05  Running with dbt=1.8.5
00:36:05  Registered adapter: snowflake=1.8.3
00:36:06  Found 2 operations, 2 models, 2 data tests, 449 macros
00:36:06  
00:36:11  
00:36:11  Running 1 on-run-start hook
=========================================
SESSION TIMEZONE IS: America/Los_Angeles
=========================================
00:36:13  1 of 1 START hook: my_dbt_project.on-run-start.0 ............................... [RUN]
00:36:13  1 of 1 OK hook: my_dbt_project.on-run-start.0 .................................. [OK in 0.00s]
00:36:13  
00:36:14  Concurrency: 1 threads (target='sf')
00:36:14  
00:36:14  1 of 4 START sql table model dbt_jyeo.a ........................................ [RUN]
=========================================
SESSION TIMEZONE IS: America/Los_Angeles
=========================================
=========================================
SESSION TIMEZONE IS: Pacific/Auckland
=========================================
00:36:18  1 of 4 OK created sql table model dbt_jyeo.a ................................... [SUCCESS 1 in 4.24s]
00:36:18  2 of 4 START sql table model dbt_jyeo.b ........................................ [RUN]
=========================================
SESSION TIMEZONE IS: America/Los_Angeles
=========================================
=========================================
SESSION TIMEZONE IS: America/Los_Angeles
=========================================
00:36:22  2 of 4 OK created sql table model dbt_jyeo.b ................................... [SUCCESS 1 in 3.66s]
00:36:22  3 of 4 START test accepted_values_a_ctz__Pacific_Auckland ...................... [RUN]
00:36:24  3 of 4 PASS accepted_values_a_ctz__Pacific_Auckland ............................ [PASS in 2.61s]
00:36:24  4 of 4 START test accepted_values_b_ctz__America_Los_Angeles ................... [RUN]
00:36:27  4 of 4 PASS accepted_values_b_ctz__America_Los_Angeles ......................... [PASS in 2.48s]
00:36:27  
00:36:27  Running 1 on-run-end hook
=========================================
SESSION TIMEZONE IS: America/Los_Angeles
=========================================
00:36:28  1 of 1 START hook: my_dbt_project.on-run-end.0 ................................. [RUN]
00:36:28  1 of 1 OK hook: my_dbt_project.on-run-end.0 .................................... [OK in 0.00s]
00:36:28  
00:36:29  
00:36:29  Finished running 2 table models, 2 data tests, 2 project hooks in 0 hours 0 minutes and 23.22 seconds (23.22s).
00:36:29  
00:36:29  Completed successfully
00:36:29  
00:36:29  Done. PASS=4 WARN=0 ERROR=0 SKIP=0 TOTAL=4

^ So far so good. The session that built model b was independent of the session that built model a - therefor changing the timezone on the session that built model a did not affect model b.

image

Now, we do a local install of dbt-snowflake from the main branch:

$ git rev-parse HEAD
3fbc0749491f40d34014336457753b140e1fb1ee

^ 3fbc0749491f40d34014336457753b140e1fb1ee is ahead of commit 6857e6b800329a6490c3339824249c18b1af5eda - which contains the commit that made reuse_connections True by default.

$ dbt build --threads 1
00:45:21  Running with dbt=1.9.0-a1
00:45:22  Registered adapter: snowflake=1.9.0-a1
00:45:22  Unable to do partial parsing because of a version mismatch
00:45:22  Found 2 models, 2 operations, 2 data tests, 452 macros
00:45:22  
00:45:25  
00:45:25  Running 1 on-run-start hook
=========================================
SESSION TIMEZONE IS: America/Los_Angeles
=========================================
00:45:27  1 of 1 START hook: my_dbt_project.on-run-start.0 ............................... [RUN]
00:45:27  1 of 1 OK hook: my_dbt_project.on-run-start.0 .................................. [OK in 0.00s]
00:45:27  
00:45:27  Concurrency: 1 threads (target='sf')
00:45:27  
00:45:27  1 of 4 START sql table model dbt_jyeo.a ........................................ [RUN]
=========================================
SESSION TIMEZONE IS: America/Los_Angeles
=========================================
=========================================
SESSION TIMEZONE IS: Pacific/Auckland
=========================================
00:45:30  1 of 4 OK created sql table model dbt_jyeo.a ................................... [SUCCESS 1 in 2.20s]
00:45:30  2 of 4 START sql table model dbt_jyeo.b ........................................ [RUN]
=========================================
SESSION TIMEZONE IS: Pacific/Auckland
=========================================
=========================================
SESSION TIMEZONE IS: Pacific/Auckland
=========================================
00:45:31  2 of 4 OK created sql table model dbt_jyeo.b ................................... [SUCCESS 1 in 1.74s]
00:45:31  3 of 4 START test accepted_values_a_ctz__Pacific_Auckland ...................... [RUN]
00:45:32  3 of 4 PASS accepted_values_a_ctz__Pacific_Auckland ............................ [PASS in 0.56s]
00:45:32  4 of 4 START test accepted_values_b_ctz__America_Los_Angeles ................... [RUN]
00:45:32  4 of 4 FAIL 1 accepted_values_b_ctz__America_Los_Angeles ....................... [FAIL 1 in 0.60s]
00:45:32  
00:45:32  Running 1 on-run-end hook
=========================================
SESSION TIMEZONE IS: America/Los_Angeles
=========================================
00:45:33  1 of 1 START hook: my_dbt_project.on-run-end.0 ................................. [RUN]
00:45:33  1 of 1 OK hook: my_dbt_project.on-run-end.0 .................................... [OK in 0.00s]
00:45:33  
00:45:34  
00:45:34  Finished running 2 table models, 2 data tests, 2 project hooks in 0 hours 0 minutes and 11.68 seconds (11.68s).
00:45:34  
00:45:34  Completed with 1 error and 0 warnings:
00:45:34  
00:45:34  Failure in test accepted_values_b_ctz__America_Los_Angeles (models/schema.yml)
00:45:34    Got 1 result, configured to fail if != 0
00:45:34  
00:45:34    compiled code at target/compiled/my_dbt_project/models/schema.yml/accepted_values_b_ctz__America_Los_Angeles.sql
00:45:34  
00:45:34  Done. PASS=3 WARN=0 ERROR=1 SKIP=0 TOTAL=4

^ Now things don't work so well anymore.

image

image

The session is shared and causes model b to no longer use the account timezone default.

Relevant log output

No response

Environment

- OS: macOS
- Python: 3.11
- dbt-core: 1.8.latest / 1.9.alpha
- dbt-snowflake: 1.8.latest / 1.9.alpha

Additional Context

In dbt Cloud, the debug logs show which commit of the adapter we are on. We have rolled forward and backward a few times this week.

As of this writing...

2024-08-13 00:04:11.218553 (MainThread): 00:04:11  Registered adapter: snowflake=1.9.0-post1+e5b28d67cf4b870da81674de3e1b84c4474a00a6

Snippet of debug logs from a dbt Cloud job run.

^ We are back on e5b28d - which is prior to commit 6857e6 (the one that introduced this default).

However, towards the end of last week, all throughout the weekend, we had been in a rolled forward state:

2024-08-11 07:31:34.545083 (MainThread): 07:31:34  Registered adapter: snowflake=1.9.0-post5+7fb4549abc52d53dfdbf02da423ee58cc8696ccc

7fb454 being ahead of 6857e6 - therefor some customers ran into this.

dbt Cloud users probably want to do:

image

In their extended attributes so that this doesn't automatically get turned on for them by default (whenever we roll forward again but have not yet fixed this issue of reuse_connections: True by default).

@jeremyyeo jeremyyeo added bug Something isn't working triage labels Aug 13, 2024
@amychen1776
Copy link

amychen1776 commented Aug 28, 2024

@jeremyyeo is this annoying for Cloud users or a true bug? This seems like an enhancement request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting_response bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants