fix materialization task count when buffering an existing job #400

dsschult · 2024-10-27T22:41:07Z

For the bug to occur, a dataset materialization had to be previously interrupted, with num_tasks % tasks_per_job != 0. When this happens, it looks back at existing jobs and compares the number of tasks to how many there should be for each job.

The bug is that materialization would return the number of tasks in the fully materialized job, not the number of tasks it had materialized. When adding this to the existing num_tasks, the modulus was still non-zero, and it would iterate back through all jobs in a dataset. On large datasets this could take an hour or more, and kubernetes would kill the materialization service for being unresponsive.

fix materialization task count when buffering an existing job

3bb59f3

dsschult added the bug label Oct 27, 2024

dsschult self-assigned this Oct 27, 2024

dsschult merged commit 8301751 into master Oct 27, 2024
10 checks passed

dsschult deleted the materialize-fix-task-count branch October 27, 2024 22:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix materialization task count when buffering an existing job #400

fix materialization task count when buffering an existing job #400

dsschult commented Oct 27, 2024

fix materialization task count when buffering an existing job #400

fix materialization task count when buffering an existing job #400

Conversation

dsschult commented Oct 27, 2024