You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By simply applying collect_map, we get the final price value seen in the input stream for each unique stock ticker.
$ super -version
Version: v1.18.0-222-g55d99d3b
$ super -Z -c 'collect_map(|{stock:price}|)' prices.json
|{
"IBM": 46.67,
"APPL": 150.13,
"GOOG": 89.15
}|
This "last value wins" behavior may be familiar to many users. For instance, ChatGPT suggested the following jq command line to combine the same stream of objects into a single JSON object.
However, @mccanne recently pointed out that silently dropping values may not be ideal, especially since SuperDB provides complex data types that could easily hold all values, such as storing them as a set if the user wants to keep each unique value, or an array if they want to keep every value (including repeats) in the order encountered in the stream.
This can already be achieved using existing building blocks, e.g., first invoking union in a separate step to create sets:
If we make this kind of functionality a change of default behavior and/or new options of collect_map, it seems we'd also want to consider if we want the wrapping in the complex type to happen even for single values (such as shown in these examples with existing building blocks) or only when multiple values are observed for a single key.
The text was updated successfully, but these errors were encountered:
tl;dr
The docs for the
collect_map
aggregate function disclose:This may be fine for many use cases, but we may want to offer options that preserve all values.
Details
At the time this issue is being opened, super is at commit 55d99d3.
To illustrate, we'll use this example input
prices.json
that's similar to the data used in thecollect_map
docs.By simply applying
collect_map
, we get the final price value seen in the input stream for each unique stock ticker.This "last value wins" behavior may be familiar to many users. For instance, ChatGPT suggested the following
jq
command line to combine the same stream of objects into a single JSON object.However, @mccanne recently pointed out that silently dropping values may not be ideal, especially since SuperDB provides complex data types that could easily hold all values, such as storing them as a set if the user wants to keep each unique value, or an array if they want to keep every value (including repeats) in the order encountered in the stream.
This can already be achieved using existing building blocks, e.g., first invoking
union
in a separate step to create sets:Or
collect
to create arrays:If we make this kind of functionality a change of default behavior and/or new options of
collect_map
, it seems we'd also want to consider if we want the wrapping in the complex type to happen even for single values (such as shown in these examples with existing building blocks) or only when multiple values are observed for a single key.The text was updated successfully, but these errors were encountered: