Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Error when loading NULL result from JSON column #1101

Closed
2 tasks done
Gilby216 opened this issue Feb 10, 2024 · 3 comments
Closed
2 tasks done

[Bug] Error when loading NULL result from JSON column #1101

Gilby216 opened this issue Feb 10, 2024 · 3 comments
Labels
bug Something isn't working Stale

Comments

@Gilby216
Copy link

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When I run the following query in dbt Cloud I get an error:
Select parse_json(NULL)

Error encountered: Runtime Error the JSON object must be str, bytes or bytearray, not NoneType

However that same query runs successfully when I run it directly in the BigQuery console. It outputs a SQL NULL value.

The query above is the simplest reproduce case. If it's useful, the actual situation I am trying to solve for here is to query into a JSON column and extract certain fields, but sometimes those fields are missing. To illustrate this more concretely, the following SQL also produces the same error in DBT Cloud:

Select parsed_json_data.ex1 from ( Select parse_json('{"ex2":"example"}') AS parsed_json_data ) AS t1

But again, that query above runs successfully directly in BigQuery console and outputs a NULL value.

If you change this query to extract the field that does exist in the JSON then everything works fine in both DBT Cloud and BigQuery console:

Select parsed_json_data.ex2 from ( Select parse_json('{"ex2":"example"}') AS parsed_json_data ) AS t1

Expected Behavior

NULL values returned from JSON columns should not error, they should output NULL.

Steps To Reproduce

In DBT Cloud, preview the following query/model:
Select parse_json(NULL)

Relevant log output

Log output from DBT Cloud:

Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f77c97e4a00>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f77a87e18e0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f77a87e1a90>]}
Running with dbt=1.6.9
running dbt with arguments {'printer_width': '80', 'indirect_selection': 'empty', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'fail_fast': 'False', 'version_check': 'True', 'log_path': '/usr/src/dbt-server-shared/working_dir/a5144474-69cd-4d3d-86cb-29bc7bb69e24', 'profiles_dir': '/usr/src/dbt-server-shared/.dbt', 'debug': 'False', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'True', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'invocation_command': 'dbt -A dbt_worker.app worker --loglevel=DEBUG --concurrency=2 --max-memory-per-child=500000', 'introspect': 'True', 'static_parser': 'True', 'target_path': 'None', 'log_format': 'json', 'send_anonymous_usage_stats': 'True'}
Sending event: {'category': 'dbt', 'action': 'project_id', 'label': 'f0da6056-b33d-4df6-bf67-b46b86b15ec0', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f77a84d6a00>]}
Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': 'f0da6056-b33d-4df6-bf67-b46b86b15ec0', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f77a84d63a0>]}
Registered adapter: bigquery=1.6.9
Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': 'f0da6056-b33d-4df6-bf67-b46b86b15ec0', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f77e1bffac0>]}
Found 64 models, 2 tests, 9 analyses, 9 sources, 0 exposures, 0 metrics, 511 macros, 0 groups, 0 semantic models
Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'f0da6056-b33d-4df6-bf67-b46b86b15ec0', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f77e1bff850>]}
Acquiring new bigquery connection 'master'
Concurrency: 4 threads (target='default')
Traceback (most recent call last):
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/task/base.py", line 372, in safe_run
    result = self.compile_and_execute(manifest, ctx)
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/task/base.py", line 323, in compile_and_execute
    result = self.run(ctx.node, manifest)
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/task/base.py", line 419, in run
    return self.execute(compiled_node, manifest)
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/task/show.py", line 38, in execute
    adapter_response, execute_result = self.adapter.execute(
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/adapters/base/impl.py", line 295, in execute
    return self.connections.execute(sql=sql, auto_begin=auto_begin, fetch=fetch, limit=limit)
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/adapters/bigquery/connections.py", line 511, in execute
    table = self.get_table_from_response(iterator)
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/adapters/bigquery/connections.py", line 439, in get_table_from_response
    return agate_helper.table_from_data_flat(resp, column_names)
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/clients/agate_helper.py", line 121, in table_from_data_flat
    for _row in data:
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/google/api_core/page_iterator.py", line 209, in _items_iter
    for item in page:
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/google/api_core/page_iterator.py", line 131, in __next__
    result = self._item_to_value(self._parent, item)
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/google/cloud/bigquery/table.py", line 3106, in _item_to_row
    _helpers._row_tuple_from_json(resource, iterator.schema),
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/google/cloud/bigquery/_helpers.py", line 313, in _row_tuple_from_json
    row_data.append(_field_from_json(cell["v"], field))
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/google/cloud/bigquery/_helpers.py", line 289, in _field_from_json
    return converter(resource, field)
  File "/venv/dbt-1.6.0-latest/lib/python3.8/site-packages/dbt/adapters/bigquery/connections.py", line 70, in _json_from_json
    return json.loads(value)
  File "/usr/lib/python3.8/json/__init__.py", line 341, in loads
    raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not NoneType
Connection 'master' was properly closed.
Connection 'sql_operation.playback_data_warehouse.inline_query' was properly closed.
Encountered an error:
Runtime Error
  the JSON object must be str, bytes or bytearray, not NoneType
Command `cli show` failed at 17:55:00.739276 after 1.01 seconds
Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f77c97e4a00>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f77a87e1970>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f77c8614610>]}
Flushing usage events
Runtime Error
  the JSON object must be str, bytes or bytearray, not NoneType

Environment

- Python: 3.8
- dbt-core: dbt-1.6.9

Additional Context

No response

@Gilby216 Gilby216 added bug Something isn't working triage labels Feb 10, 2024
@dbeatty10
Copy link
Contributor

Thanks for reporting this @Gilby216!

TLDR

I think #1061 will fix this.

Root cause

We've had a couple issues that look similar to this, namely #972 and #1055.

It looks like the underlying issue for this one is within our fix added in #974.

When I tried the following monkey patch instead, it worked:

def _json_from_json(value, _):
    """NOOP string -> string coercion"""

    return None if value is None else json.loads(value)

Solution

The fix might be covered by #1061, which removes our monkey patch in favor of relying on googleapis/python-bigquery#1587 instead.

We should add the following reproduction case to a test case within #1061 to confirm.

When I removed the monkey patch in my local environment, it worked.

Reprex

Run the following while using dbt-bigquery:

dbt show --inline 'select parse_json(null)'

Get this error:

01:36:38  Unhandled error while executing 
the JSON object must be str, bytes or bytearray, not NoneType
01:36:38  Encountered an error:
Runtime Error
  the JSON object must be str, bytes or bytearray, not NoneType

Workaround

In the meantime, @jeremyyeo mentioned within #1055 that you can coerce it into to_json(null) to handle this edge case:

Select ifnull(parse_json(NULL), to_json(null))

Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Aug 15, 2024
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

2 participants