vine: efficient resource allocation #4006

JinZhou5042 · 2024-12-11T02:22:43Z

Proposed Changes

Problem

Main issue: resource allocation for tasks mostly fail, which harms the concurrency and extends the workflow execution time

Tentative resource allocation policy:

The number of cores is the first-class citizen – if there are no available cores, never attempt to consider a task or even select a worker for the task
With proportional resource allocation, if there is one core usable, there will always be enough memory to use.
The overuse of worker cache is the only reason for the failure of task resource allocation if there are available cores
Given that we the overuse of worker cache is something not likely to happen (we usually have large enough disk), the number cores has a decisive impact on whether the resource allocation will be succeed.
Therefore, we want to keep track of the global usable cores (or function slots) and allow task resource allocation if there is any. This ensures that most task resource allocations succeed, and that we don't waste time on considering a depth of tasks if no cores available at all.

Here is an analysis on why the allocated disk tends to be bigger than the available disk (I excluded the input size for tasks as they don't seem to matter):

In vine_manager_choose_resources_for_task, using proportional resource allocation, this is how we choose disk resource for a task:

In check_worker_have_enough_resources, this is how we calculate the available disk on a worker:

Initially, there are no tasks running, c is totally free, so the first several tasks get larger allocated disk. As more tasks are assigned on that worker, their outputs are brought to the cache, so c becomes smaller.

Say the size of cache increases delta_c on task t_i completion, now task t_(i+1) gets scheduled. Compared to t_i, we have a decrease in both disk allocation and disk available:

Apparently, as cache being more used, the available disk tends to shrink more than the allocated disk. disk_allocate > disk_available happens when:

Which is:

When we have more tasks running, we use more cache space, c becomes larger, the right hand side gets smaller; we use more task sandboxes, s becomes larger, the left hand side gets bigger. Therefore, such inequality becomes more likely to be satisfied and that's why more disk allocations fail as we have more tasks running.

Here is an example that aligns with the analysis. I requested for 16 cores, but at each time, there are at most 15 tasks running concurrently:

The csv file that records the resource allocation history

Solutions

The proportional resource allocation should not exceed the available resources, force a hard limit on it.
Keep track of the global usable cores / function slots, one task can be considered only if there are available cores or slots.

Results

A dramatic improvement of task concurrency!

Original:

With the proposed solutions

Merge Checklist

The following items must be completed before PRs can be merged.
Check these off to verify you have completed all steps.

make test Run local tests prior to pushing.
make format Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)
make lint Run lint on source code prior to pushing.
Manual Update: Update the manual to reflect user-visible changes.
Type Labels: Select a github label for the type: bugfix, enhancement, etc.
Product Labels: Select a github label for the product: TaskVine, Makeflow, etc.
PR RTM: Mark your PR as ready to merge.

JinZhou5042 · 2024-12-12T22:19:17Z

Before comments from @colinthomas-z80 and @btovar, I didn't fully understand the problem from the perspective of algorithm design, the math was straightforward but didn't identify the underlying problem.

What's going wrong in the code is that we are using disk_total - cache_inuse as the available disk to allocate. However, that's not the actual amount of available disk to use. Instead, the total size of task sandboxes should be accounted as well, which means the available disk should be: disk_total - cache_inuse - sandboxes.

That said, we should use task_disk_estimate = (disk_total - disk_inuse) * proportion to guess the disk allocation. @dthain further suggested that for the disk estimate, leaving half of it being freed would provide more disk space for incoming tasks and allow for potential cache expansion, so we have task_disk_estimate /= 2.

With these changes, results are very encouraging! Both concurrency and success rate for resource estimate improved significantly, and the policy is very self-consistent!

btovar · 2024-12-13T01:22:07Z

Jin, I don't think that disk_total - cache_inuse - sandboxes is correct. One way to see this is that if the tasks do not have any input files, and I want to schedule two tasks to the worker, the second task will get a smaller proportion than the first one.

I believe that cache_inuse and sandboxes are not important here and are just confusing the main issue, which is that by design, the proportional computation gives conservative allocations. This is true for all the resources, and it is my hunch that this is not an accounting problem.

I'd be more comfortable with a solution that includes all the resources. For example, check at the end if an allocation from a computed proportion would not fit in the worker, then modify it in an easy-to-explain way (e.g., divide it by 2), and let the scheduler decide if it would fit, and maybe reject the allocation. I do not think we want to allocate whatever is left, as we want automated allocations for similar tasks to be similar (about the same order of magnitude). For example, we do not want one task to be 1GB and another to be 10MB just because that's what was left. Such correction should be made before we ensure that the allocation goes below limits that were explicitly specified.

taskvine/src/manager/vine_manager.c

dthain · 2025-01-16T16:13:53Z

Come on over today and let's talk through a few things that I would like to understand better.

taskvine/src/manager/vine_schedule.c

dthain · 2025-01-17T16:20:10Z

Per our discussion today, the disk allocation should be:

disk = ((total - cache)*frac)/proportion

Where frac is a tunable value with a default of 0.75

taskvine/src/manager/vine_manager.c

dthain

One last thing, almost there!

taskvine/src/manager/vine_manager.c

dthain

Hooray! Good work on a long and complex PR!

btovar

There seem to be a lot of changes in this pr that are unrelated.
I suggest to close this pr and resubmit with only the following changes:

The multiplying factor to the disk_available.
The corresponding tune parameter.
Updates to the tune parameter documentation in the manual.

btovar · 2025-01-21T16:03:59Z

taskvine/src/manager/vine_manager.c

 		/* Compute the proportion of the worker the task shall have across resource types. */
 		double max_proportion = -1;
-		double min_proportion = -1;


Min proportion is needed when using automatic resource allocation via categories, please do not remove it.

JinZhou5042 · 2025-01-21T16:19:19Z

Sounds good! I would like to hold it a bit until the DV5 application is compatible with the latest DaskVine/Dask and a supported Coffea version. I currently can't do experimental tests on DV5 with our latest changes.

dthain · 2025-01-21T16:40:02Z

Hmm, are you able to run a test by going back to an earlier version of Coffea and/or DV5?

JinZhou5042 · 2025-01-21T16:52:56Z

Just in case something unexpected happens since we had a few PRs getting merged. I can test with an earlier version of Dask + DaskVine in my end, but if try to merge that PR then resource allocations for the latest cctools will remain untested.

JinZhou5042 · 2025-01-27T19:08:04Z

All features in this PR have been separated into a sequence of PRs.

JinZhou5042 and others added 3 commits December 10, 2024 13:49

log the res allocation

329f2c9

Merge branch 'cooperative-computing-lab:master' into resource_allocation

95bf025

efficient resource allocation

2ed9c0e

JinZhou5042 marked this pull request as draft December 11, 2024 02:22

JinZhou5042 self-assigned this Dec 11, 2024

JinZhou5042 added 14 commits December 10, 2024 21:25

lint

5620893

test

e4aeea0

test

aa85792

remove test comment

b6e17e0

BYTES_TO_MEGABYTES w->inuse_cache

7ac0d39

remove comment

ac81d81

change variable name

62c7856

limit available

fa663af

lint

18e09dd

count the global usable cores

3733253

count slots and cores

0c2d22d

rename

6e8649d

remove comments

a3e286f

variable init

e264f74

JinZhou5042 marked this pull request as ready for review December 12, 2024 02:27

JinZhou5042 added 5 commits December 12, 2024 00:57

check worker cores

e207615

log resource allocation failures

1f3cb2d

lint

9c9caa4

don't use %ld

47e27b0

check mem and disk as well

b95f32b

JinZhou5042 marked this pull request as draft December 12, 2024 13:17

fix

1b5a23b

JinZhou5042 mentioned this pull request Dec 12, 2024

vine: oddly long task retrieving time #4007

Open

fix comment

933f919

dthain reviewed Jan 14, 2025

View reviewed changes

JinZhou5042 added 3 commits January 14, 2025 15:30

skip function tasks

5d98a7a

fix comments

89e2257

lint

5063baa

JinZhou5042 requested a review from dthain January 15, 2025 18:32

dthain reviewed Jan 16, 2025

View reviewed changes

taskvine/src/manager/vine_schedule.c Outdated Show resolved Hide resolved

add more comments

18cfaac

JinZhou5042 and others added 3 commits January 17, 2025 11:57

add disk_allocation_throttle_factor

4e9b577

Merge branch 'master' into resource_allocation

5044b77

initiate disk_allocation_throttle_factor

098fc58

btovar reviewed Jan 17, 2025

View reviewed changes

taskvine/src/manager/vine_manager.c Outdated Show resolved Hide resolved

JinZhou5042 added 2 commits January 17, 2025 12:49

change name and test

8e83f5f

lint

e9d4f8b

JinZhou5042 requested review from btovar and dthain January 17, 2025 18:07

dthain requested changes Jan 17, 2025

View reviewed changes

taskvine/src/manager/vine_manager.c Outdated Show resolved Hide resolved

name fix

46aec03

JinZhou5042 requested a review from dthain January 17, 2025 19:35

dthain approved these changes Jan 17, 2025

View reviewed changes

btovar requested changes Jan 21, 2025

View reviewed changes

JinZhou5042 marked this pull request as draft January 23, 2025 21:35

JinZhou5042 mentioned this pull request Jan 24, 2025

vine: reserve a factor of disk when allocating resources #4035

Open

7 tasks

JinZhou5042 closed this Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vine: efficient resource allocation #4006

vine: efficient resource allocation #4006

JinZhou5042 commented Dec 11, 2024 •

edited

Loading

JinZhou5042 commented Dec 12, 2024 •

edited

Loading

btovar commented Dec 13, 2024 •

edited

Loading

dthain commented Jan 16, 2025

dthain commented Jan 17, 2025

dthain left a comment

dthain left a comment

btovar left a comment

btovar Jan 21, 2025

JinZhou5042 commented Jan 21, 2025

dthain commented Jan 21, 2025

JinZhou5042 commented Jan 21, 2025

JinZhou5042 commented Jan 27, 2025

vine: efficient resource allocation #4006

vine: efficient resource allocation #4006

Conversation

JinZhou5042 commented Dec 11, 2024 • edited Loading

Proposed Changes

Problem

Solutions

Results

Merge Checklist

JinZhou5042 commented Dec 12, 2024 • edited Loading

btovar commented Dec 13, 2024 • edited Loading

dthain commented Jan 16, 2025

dthain commented Jan 17, 2025

dthain left a comment

Choose a reason for hiding this comment

dthain left a comment

Choose a reason for hiding this comment

btovar left a comment

Choose a reason for hiding this comment

btovar Jan 21, 2025

Choose a reason for hiding this comment

JinZhou5042 commented Jan 21, 2025

dthain commented Jan 21, 2025

JinZhou5042 commented Jan 21, 2025

JinZhou5042 commented Jan 27, 2025

JinZhou5042 commented Dec 11, 2024 •

edited

Loading

JinZhou5042 commented Dec 12, 2024 •

edited

Loading

btovar commented Dec 13, 2024 •

edited

Loading