Is it possible to have a task list execute in parallel #24

MartinWallgren · 2021-04-27T13:25:10Z

Is it possible to create a task that executes multiple tasks in parallel?

I know I can create a compond task as

test = ["mypy", "pylint", "pytest"]

Calling poe test will run each task in sequence one after the other. It would be nice to be able to configure these task lists as safe to start in parallel.

Parallel should of course not be parallel by default since some tasks requires output from previous tasks (coverage being the prime example that needs a completed test run to before generating a coverage report).

The text was updated successfully, but these errors were encountered:

nat-n · 2021-04-27T20:43:49Z

This is not currently supported. I considered it when first implementing this sequence task type. I thought it might be nice if by default an array inside an array would be interpreted as a ParallelTask type within a SequenceTask, so for example the following would run mypy and pylint in parallel then pytest after that:

test = [["mypy", "pylint"], "pytest"]

And of course you could also do:

test.parallel = ["mypy", "pylint", "pytest"]

However the problem is that I'm not sure what it should do with stdout. I imagine one wouldn't simply want both subprocesses to write to the same console at the same time! Maybe there could be a solution along the lines of capturing the output and feeding it out to the console one line at a time (maybe with a prefix linking it to the task that produced it, kind of like docker-compose does) but that's getting complicated to implement.

As I mention in #26, if the stdout of those tasks were configured to be captured anyway – such as for use in another task, or maybe to be piped to a file or discarded – then this problem goes away, and the tasks might as well be run in parallel. There's just the question left of how to handle a failure of one task in the set (whether to wait for the others).

I'd like to support parallel execution, but I'm really not sure how it should work. What do you think @MartinWallgren?

nat-n · 2021-04-27T20:47:02Z

Also a potential if inelegant workaround might be to use a shell task with background jobs, like something along the lines of:

[tool.poe.tasks.test]
shell = """
poe mypy &
poe pylint &
poe pytest &
wait $(jobs -p)
"""

asfaltboy · 2022-02-20T08:36:26Z

We do something like this in bash:

First run all processes together, use a temp file to store each command's output
Then, we iterate over each command waiting for its completion and display that command's output.

The side effect of this simple method is that it seemingly "stalls" on the slowest command, returning when they all complete. This means that CMDS array should preferably be sorted fastest to slowest.

Click to expand!

for cmd in "${CMDS[@]}"; do
    stdout="$(mktemp)"
    timer="$(mktemp)"
    { { time $cmd >>"$stdout" 2>&1 ; } >>"$timer" 2>&1 ; } &
    pids+=($!)
    stdouts+=("$stdout")
    timers+=("$timer")
done

for i in ${!CMDS[*]}; do
    if wait "${pids[$i]}"; then
        codes+=(0)
    else
        codes+=(1)
    fi

    if [ "${codes[$i]}" -eq "0" ]; then
        echo -en "${C_GREEN}"
        echo -en "${CMDS[$i]}"
        echo -en "$C_RESET"
        echo -e " ($(cat "${timers[$i]}")s)"
    else
        echo -en "${C_RED}${C_UNDERLINE}"
        echo -en "${CMDS[$i]}"
        echo -e "$C_RESET"
        echo -e "$(cat "${stdouts[$i]}")"
    fi
    echo ""
done

jnoortheen · 2022-04-16T15:55:17Z

another way is to use the gnu-parallel command

parallel ::: "flake8" "mypy dirname"

@nat-n the intial implementation can be very simple.

lets say there are three tasks passed, we give the tty to the first task only (that means no capturing), so user can see the progress from that task. once the first task is finished running, we print the output/error from the next task and so on.
regarding error, we run all tasks even if we counter errors and return a failure code and mention what are all failed. (doing 1, we will be printing the errors already)

we can later add some config about how these are executed. It can be crossplatform alternative to parallel

ThatXliner · 2022-04-16T17:15:50Z

we can later add some config about how these are executed. It can be crossplatform alternative to parallel

like a backend config on how to parallelize?

jnoortheen · 2022-04-16T17:24:10Z

like a backend config on how to parallelize?

Yes some task or project level configs

nat-n · 2022-04-18T13:20:24Z

Hi @jnoortheen, thanks for the idea.

I understand that you're proposing the following strategy which I'll call Strategy 1:

let the first task in the list output directly to stdout until it completes
for each subsequent task: buffer its stdout in memory (or a tempfile to avoid unbounded memory use) until it completes
dump the buffered output of each completed task, once all previous tasks have been output

This is probably the best solution in terms of having a coherent output log at the end. Though it assumes that the tasks in the list are meaningfully ordered which doesn't seem necessary. Therefore it might sometimes make more sense to use the following Strategy 2 instead:

treat all tasks in the list as having equal precedence and buffer their output until they complete
whenever a task completes then dump its output to stdout (even if tasks specified earlier in the list are still running)

Both Strategy 1 and Strategy 2 would benefit from poe providing some extra output lines to clarify which output is from which task (unless running in quiet mode).

Strategy 3 would be like Strategy 2 except we capture and output each line of task output as it arrives (with some prefix indicating which task it came from)

And Strategy 4 would be to just let all tasks output directly to stdout on top of one another, which may sometimes be necessary to support

Are there any other strategies worth considering? Is is worthwhile also being able to direct outputs to separate filesystem locations? e.g. f"task_name_{subtask_number}.out"

I think it would be best if the user can configure the strategy for a specific parallel task independently for stdout and stderr, with Strategy 1 being the default for stdout and Strategy 3 or 4 being the default for stderr.

Maybe how to handle errors should also be configurable, with continuing other tasks but returning non-zero at the end as the default behaviour if one or more tasks fail. But also having the option to stop all tasks if one fails, or even to always continue and return zero, would also make sense.

I'm thinking this would require having a thread per running subtask, which is responsible for monitoring the subtask and handling its output.

To be clear I would not be keen on making gnu parallel (or any other binary less common than bash itself) a dependency of poethepoet, and implementing such an integration mechanism would probably be a bit complex to get right.

Any other ideas?

ThatXliner · 2022-04-18T15:36:44Z

~~Seems good but why 3 or 4 as default for stderr?~~

On second thought, yeah: you want to see the errors quickly. I was thinking of those multi-line errors/warnings like those from pip… so maybe buffer the lines a bit until, say, 0.2 seconds has passed and no more new lines has been seen so far from process X?

jnoortheen · 2022-04-18T16:34:41Z

I've actually implemented that suggested solution here using asyncio.subprocess module. It just outputs stdout from commands to sys.stdout, stderr to stderr.

ThatXliner · 2022-10-02T00:19:55Z

We could take some inspiration from https://github.com/open-cli-tools/concurrently#readme

luketych · 2023-04-09T03:02:12Z

+1 interest on implementing this

nat-n · 2023-04-10T12:50:47Z

I think this is an important feature, but it's currently not near the top of my list. If someone wants to submit a PoC for one or more of the strategies discussed above then that would help move it along :)

Strategy 1 using asyncio.subprocess as @jnoortheen suggests is probably a good place to start. I'm thinking this would be a new task type: parallel that is otherwise similar to the sequence task type.

luketych · 2023-04-10T21:21:48Z

@nat-n what is currently at the top of your list? Maybe some of us could help on those.

sewi-cpan · 2024-02-19T20:07:11Z

+1 for this request

JCHacking · 2024-04-11T14:49:24Z

I think it would be easier to run it in threads, since the current code is not asynchronous (maybe for version 0.3 it could be rewritten asynchronously?).

nat-n mentioned this issue Apr 27, 2021

Proposal: Task execution graph #26

Closed

nat-n added the help wanted Extra attention is needed label Apr 10, 2023

biasedbit mentioned this issue Nov 22, 2023

Duplicate SIGINT sent to suprocess on Unix based systems #189

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to have a task list execute in parallel #24

Is it possible to have a task list execute in parallel #24

MartinWallgren commented Apr 27, 2021

nat-n commented Apr 27, 2021 •

edited

Loading

nat-n commented Apr 27, 2021

asfaltboy commented Feb 20, 2022

jnoortheen commented Apr 16, 2022

ThatXliner commented Apr 16, 2022

jnoortheen commented Apr 16, 2022

nat-n commented Apr 18, 2022

ThatXliner commented Apr 18, 2022 •

edited

Loading

jnoortheen commented Apr 18, 2022

ThatXliner commented Oct 2, 2022

luketych commented Apr 9, 2023

nat-n commented Apr 10, 2023 •

edited

Loading

luketych commented Apr 10, 2023

sewi-cpan commented Feb 19, 2024

JCHacking commented Apr 11, 2024 •

edited

Loading

Is it possible to have a task list execute in parallel #24

Is it possible to have a task list execute in parallel #24

Comments

MartinWallgren commented Apr 27, 2021

nat-n commented Apr 27, 2021 • edited Loading

nat-n commented Apr 27, 2021

asfaltboy commented Feb 20, 2022

jnoortheen commented Apr 16, 2022

ThatXliner commented Apr 16, 2022

jnoortheen commented Apr 16, 2022

nat-n commented Apr 18, 2022

ThatXliner commented Apr 18, 2022 • edited Loading

jnoortheen commented Apr 18, 2022

ThatXliner commented Oct 2, 2022

luketych commented Apr 9, 2023

nat-n commented Apr 10, 2023 • edited Loading

luketych commented Apr 10, 2023

sewi-cpan commented Feb 19, 2024

JCHacking commented Apr 11, 2024 • edited Loading

nat-n commented Apr 27, 2021 •

edited

Loading

ThatXliner commented Apr 18, 2022 •

edited

Loading

nat-n commented Apr 10, 2023 •

edited

Loading

JCHacking commented Apr 11, 2024 •

edited

Loading