Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update build process to add "null" as an allowed type for non-required properties #811

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

sean-rose
Copy link
Contributor

Many telemetry properties aren't configured to be required, and if the property is omitted entirely in submitted telemetry then the associated column in BigQuery ends up with a null value. However, if such non-required properties aren't specifically configured to allow null values and telemetry is sent with an explicit null value for the property then validation will fail with an "unexpected type" error and the entire record will be rejected, despite the fact that from the BigQuery perspective it's functionally equivalent to omitting the non-required property entirely. IMO this behavior is detrimental (e.g. DENG-3588).

In order to mitigate the above problem, this PR updates the schema build process to try to automatically identify non-required properties and adjust their configuration to allow null values.


Checklist for reviewer:

For glean changes:

  • Update templates/include/glean/CHANGELOG.md

For modifications to schemas in restricted namespaces (see CODEOWNERS):

@auto-assign auto-assign bot requested a review from akkomar May 2, 2024 20:16
@sean-rose sean-rose force-pushed the allow-nulls-for-nonrequired branch from b6b1af3 to 536c02b Compare May 2, 2024 20:48
@sean-rose sean-rose force-pushed the allow-nulls-for-nonrequired branch from 536c02b to 6becf29 Compare May 2, 2024 20:49
@sean-rose
Copy link
Contributor Author

The assert-telemetry-version test is currently failing as a side effect of four legacy telemetry ping schemas not having the version property marked as required:

  • telemetry/anonymous/anonymous.4.schema.json
  • telemetry/bhr/bhr.4.schema.json
  • telemetry/normandy-login-study/normandy-login-study.4.schema.json
  • telemetry/xfocsp-error-report/xfocsp-error-report.4.schema.json

Should those schemas be changed so the version property is required?

@sean-rose
Copy link
Contributor Author

Should those schemas be changed so the version property is required?

#812

@BenWu
Copy link
Contributor

BenWu commented May 3, 2024

Can you check payload_bytes_error to see how many pings are being dropped because of this and which pings/fields are most frequent? The changes make sense to me but it would be useful to understand the impact

@sean-rose
Copy link
Contributor Author

Can you check payload_bytes_error to see how many pings are being dropped because of this and which pings/fields are most frequent? The changes make sense to me but it would be useful to understand the impact

Beyond the incident with the distro property (which you already addressed manually via #810) there doesn't appear to be any significant number of pings dropped recently due to null values that would be accepted with this PR's changes deployed. I checked the cases with >1k dropped pings due to null values in the last month, and all of the associated properties are marked as required and thus wouldn't have their validation changed.

So this is mostly just guarding against future incidents like what happened with the distro property.

For reference, I used the following query:

select document_namespace, document_type, document_version, error_message, count(*) as error_count
from `moz-fx-data-shared-prod.monitoring.payload_bytes_error_all`
where
  date(submission_timestamp) >= current_date() - 31
  and error_message like '%found: Null'
  and error_message not like '%/distro:%'
group by all
order by error_count desc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants