Batch optimized translation for Spark Runner #20943

damccorm · 2022-06-04T20:43:05Z

Spark Runner and maybe all other runners that deal with batch only data might benefit of a batch optimized translation where details about the full Beam model matter less because we are in Global window, no panes info is needed and all records use the sane (min) timestamp. With this premise the records can be encoded as 'value only' WindowValues and transforms like GroupByKey may ignore windowing (GABW) to improve performance.

Imported from Jira BEAM-12135. Original Jira may contain additional context.
Reported by: iemejia.

twosom · 2024-12-01T08:46:56Z

.take-issue

damccorm added improvement P3 runner-spark labels Jun 4, 2022

damccorm added runners spark and removed runner-spark labels Jun 16, 2022

github-actions bot assigned twosom Dec 1, 2024

twosom mentioned this issue Dec 8, 2024

Batch optimized SparkRunner groupByKey #33322

Merged

3 tasks

damccorm closed this as completed in #33322 Dec 13, 2024

github-actions bot added this to the 2.62.0 Release milestone Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch optimized translation for Spark Runner #20943

Batch optimized translation for Spark Runner #20943

damccorm commented Jun 4, 2022

twosom commented Dec 1, 2024

Batch optimized translation for Spark Runner #20943

Batch optimized translation for Spark Runner #20943

Comments

damccorm commented Jun 4, 2022

twosom commented Dec 1, 2024