-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vine: efficient resource allocation #4006
vine: efficient resource allocation #4006
Conversation
Before comments from @colinthomas-z80 and @btovar, I didn't fully understand the problem from the perspective of algorithm design, the math was straightforward but didn't identify the underlying problem. What's going wrong in the code is that we are using That said, we should use With these changes, results are very encouraging! Both concurrency and success rate for resource estimate improved significantly, and the policy is very self-consistent! |
Jin, I don't think that I believe that cache_inuse and sandboxes are not important here and are just confusing the main issue, which is that by design, the proportional computation gives conservative allocations. This is true for all the resources, and it is my hunch that this is not an accounting problem. I'd be more comfortable with a solution that includes all the resources. For example, check at the end if an allocation from a computed proportion would not fit in the worker, then modify it in an easy-to-explain way (e.g., divide it by 2), and let the scheduler decide if it would fit, and maybe reject the allocation. I do not think we want to allocate whatever is left, as we want automated allocations for similar tasks to be similar (about the same order of magnitude). For example, we do not want one task to be 1GB and another to be 10MB just because that's what was left. Such correction should be made before we ensure that the allocation goes below limits that were explicitly specified. |
Come on over today and let's talk through a few things that I would like to understand better. |
Per our discussion today, the disk allocation should be:
Where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last thing, almost there!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hooray! Good work on a long and complex PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seem to be a lot of changes in this pr that are unrelated.
I suggest to close this pr and resubmit with only the following changes:
- The multiplying factor to the disk_available.
- The corresponding tune parameter.
- Updates to the tune parameter documentation in the manual.
/* Compute the proportion of the worker the task shall have across resource types. */ | ||
double max_proportion = -1; | ||
double min_proportion = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Min proportion is needed when using automatic resource allocation via categories, please do not remove it.
Sounds good! I would like to hold it a bit until the DV5 application is compatible with the latest DaskVine/Dask and a supported Coffea version. I currently can't do experimental tests on DV5 with our latest changes. |
Hmm, are you able to run a test by going back to an earlier version of Coffea and/or DV5? |
Just in case something unexpected happens since we had a few PRs getting merged. I can test with an earlier version of Dask + DaskVine in my end, but if try to merge that PR then resource allocations for the latest cctools will remain untested. |
All features in this PR have been separated into a sequence of PRs. |
Proposed Changes
Fix #3995
Problem
Main issue: resource allocation for tasks mostly fail, which harms the concurrency and extends the workflow execution time
Tentative resource allocation policy:
Here is an analysis on why the allocated disk tends to be bigger than the available disk (I excluded the input size for tasks as they don't seem to matter):
In
vine_manager_choose_resources_for_task
, using proportional resource allocation, this is how we choose disk resource for a task:In
check_worker_have_enough_resources
, this is how we calculate the available disk on a worker:Initially, there are no tasks running,
c
is totally free, so the first several tasks get larger allocated disk. As more tasks are assigned on that worker, their outputs are brought to the cache, soc
becomes smaller.Say the size of cache increases
delta_c
on taskt_i
completion, now taskt_(i+1)
gets scheduled. Compared tot_i
, we have a decrease in both disk allocation and disk available:Apparently, as cache being more used, the available disk tends to shrink more than the allocated disk.
disk_allocate > disk_available
happens when:Which is:
When we have more tasks running, we use more cache space,
c
becomes larger, the right hand side gets smaller; we use more task sandboxes,s
becomes larger, the left hand side gets bigger. Therefore, such inequality becomes more likely to be satisfied and that's why more disk allocations fail as we have more tasks running.Here is an example that aligns with the analysis. I requested for 16 cores, but at each time, there are at most 15 tasks running concurrently:
The csv file that records the resource allocation history
Solutions
Results
A dramatic improvement of task concurrency!
Merge Checklist
The following items must be completed before PRs can be merged.
Check these off to verify you have completed all steps.
make test
Run local tests prior to pushing.make format
Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)make lint
Run lint on source code prior to pushing.