What's Changed
[NEW] 💎 Collection-based windowed aggregations
A new window operation was added to gather all events in the window into batches - collect()
.
You can use it to perform aggregations requiring collections that cannot be expressed via the reduce()
approach, such as calculating medians.
This operation is optimized for collecting values and performs significantly better than using reduce()
to accumulate batches of data.
Example:
### Collect all events over a 10-minute tumbling window into a list. ###
from datetime import timedelta
from quixstreams import Application
app = Application(...)
sdf = app.dataframe(...)
sdf = (
# Define a tumbling window of 10 minutes
sdf.tumbling_window(timedelta(minutes=10))
# Collect events in the window into a list
.collect()
# Emit results only for closed windows
.final()
)
# Output:
# {
# 'start': <window start>,
# 'end': <window end>,
# 'value': [event1, event2, event3, ...] - list of all events in the window
# }
Docs - https://quix.io/docs/quix-streams/windowing.html#collect
By @gwaramadze in #688
Full Changelog: v3.6.1...v3.7.0