Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cast using Grok's data conversion syntax #4928

Open
philrz opened this issue Dec 7, 2023 · 1 comment
Open

Cast using Grok's data conversion syntax #4928

philrz opened this issue Dec 7, 2023 · 1 comment

Comments

@philrz
Copy link
Contributor

philrz commented Dec 7, 2023

tl;dr

The reference Grok implementation has a :type "type conversion" syntax that may appear in patterns. This is not yet supported in Zed's grok() function.

Details

From Elastic's page for the Grok filter plugin for Logstash:

Optionally you can add a data type conversion to your grok pattern. By default all semantics are saved as strings. If you wish to convert a semantic’s data type, for example change a string to an integer then suffix it with the target data type. For example %{NUMBER:num:int} which converts the num semantic from a string to an integer. Currently the only supported conversions are int and float.

This syntax is used in many of the examples out on the Internet that users may find as they're learning Grok with intent to apply it in Zed (e.g., here).

The initial grok() implementation added to Zed via #4827 accepts the syntax but effectively ignores it such that the parsed values become strings. We rationalized this simplification since the user can apply Zed's casting functions downstream in the pipeline to turn these strings into richer Zed types if they wish. However, in a complex log parsing config this could potentially lead to the repetition of lots of field names which makes for Zed that's less readable and more difficult to maintain. Therefore we may want to add support for this syntax at some point.

Note that this could ultimately create a unique differentiator in Zed: Other JSON-centric implementations of Grok are limited to converting to the limited set of JSON types whereas the Zed implementation could support the full set of rich Zed data types.

@philrz
Copy link
Contributor Author

philrz commented Aug 16, 2024

A user found themselves asking about this functionality in a recent community Slack thread. In their own words:

i guess this probably couldn’t easily be dealt with … using NONNEGINT results in a string in the zson … but I get that at a Go code level, it’s just a regex and there’s no real data there that could tell it “hey, this is a number, don’t write it to zson as a string”

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant