Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix materialization task count when buffering an existing job #400

Merged
merged 1 commit into from
Oct 27, 2024

Conversation

dsschult
Copy link
Collaborator

For the bug to occur, a dataset materialization had to be previously interrupted, with num_tasks % tasks_per_job != 0. When this happens, it looks back at existing jobs and compares the number of tasks to how many there should be for each job.

The bug is that materialization would return the number of tasks in the fully materialized job, not the number of tasks it had materialized. When adding this to the existing num_tasks, the modulus was still non-zero, and it would iterate back through all jobs in a dataset. On large datasets this could take an hour or more, and kubernetes would kill the materialization service for being unresponsive.

@dsschult dsschult added the bug label Oct 27, 2024
@dsschult dsschult self-assigned this Oct 27, 2024
@dsschult dsschult merged commit 8301751 into master Oct 27, 2024
10 checks passed
@dsschult dsschult deleted the materialize-fix-task-count branch October 27, 2024 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant