fix materialization task count when buffering an existing job #400
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For the bug to occur, a dataset materialization had to be previously interrupted, with
num_tasks % tasks_per_job != 0
. When this happens, it looks back at existing jobs and compares the number of tasks to how many there should be for each job.The bug is that materialization would return the number of tasks in the fully materialized job, not the number of tasks it had materialized. When adding this to the existing
num_tasks
, the modulus was still non-zero, and it would iterate back through all jobs in a dataset. On large datasets this could take an hour or more, and kubernetes would kill the materialization service for being unresponsive.