-
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to have a task list execute in parallel #24
Comments
This is not currently supported. I considered it when first implementing this sequence task type. I thought it might be nice if by default an array inside an array would be interpreted as a ParallelTask type within a SequenceTask, so for example the following would run mypy and pylint in parallel then pytest after that: test = [["mypy", "pylint"], "pytest"] And of course you could also do: test.parallel = ["mypy", "pylint", "pytest"] However the problem is that I'm not sure what it should do with stdout. I imagine one wouldn't simply want both subprocesses to write to the same console at the same time! Maybe there could be a solution along the lines of capturing the output and feeding it out to the console one line at a time (maybe with a prefix linking it to the task that produced it, kind of like docker-compose does) but that's getting complicated to implement. As I mention in #26, if the stdout of those tasks were configured to be captured anyway – such as for use in another task, or maybe to be piped to a file or discarded – then this problem goes away, and the tasks might as well be run in parallel. There's just the question left of how to handle a failure of one task in the set (whether to wait for the others). I'd like to support parallel execution, but I'm really not sure how it should work. What do you think @MartinWallgren? |
Also a potential if inelegant workaround might be to use a shell task with background jobs, like something along the lines of: [tool.poe.tasks.test]
shell = """
poe mypy &
poe pylint &
poe pytest &
wait $(jobs -p)
""" |
We do something like this in bash:
The side effect of this simple method is that it seemingly "stalls" on the slowest command, returning when they all complete. This means that CMDS array should preferably be sorted fastest to slowest. Click to expand!for cmd in "${CMDS[@]}"; do
stdout="$(mktemp)"
timer="$(mktemp)"
{ { time $cmd >>"$stdout" 2>&1 ; } >>"$timer" 2>&1 ; } &
pids+=($!)
stdouts+=("$stdout")
timers+=("$timer")
done
for i in ${!CMDS[*]}; do
if wait "${pids[$i]}"; then
codes+=(0)
else
codes+=(1)
fi
if [ "${codes[$i]}" -eq "0" ]; then
echo -en "${C_GREEN}"
echo -en "${CMDS[$i]}"
echo -en "$C_RESET"
echo -e " ($(cat "${timers[$i]}")s)"
else
echo -en "${C_RED}${C_UNDERLINE}"
echo -en "${CMDS[$i]}"
echo -e "$C_RESET"
echo -e "$(cat "${stdouts[$i]}")"
fi
echo ""
done |
another way is to use the gnu-parallel command parallel ::: "flake8" "mypy dirname" @nat-n the intial implementation can be very simple.
we can later add some config about how these are executed. It can be crossplatform alternative to parallel |
like a |
Yes some task or project level configs |
Hi @jnoortheen, thanks for the idea. I understand that you're proposing the following strategy which I'll call Strategy 1:
This is probably the best solution in terms of having a coherent output log at the end. Though it assumes that the tasks in the list are meaningfully ordered which doesn't seem necessary. Therefore it might sometimes make more sense to use the following Strategy 2 instead:
Both Strategy 1 and Strategy 2 would benefit from poe providing some extra output lines to clarify which output is from which task (unless running in quiet mode). Strategy 3 would be like Strategy 2 except we capture and output each line of task output as it arrives (with some prefix indicating which task it came from) And Strategy 4 would be to just let all tasks output directly to stdout on top of one another, which may sometimes be necessary to support Are there any other strategies worth considering? Is is worthwhile also being able to direct outputs to separate filesystem locations? e.g. I think it would be best if the user can configure the strategy for a specific parallel task independently for stdout and stderr, with Strategy 1 being the default for stdout and Strategy 3 or 4 being the default for stderr. Maybe how to handle errors should also be configurable, with continuing other tasks but returning non-zero at the end as the default behaviour if one or more tasks fail. But also having the option to stop all tasks if one fails, or even to always continue and return zero, would also make sense. I'm thinking this would require having a thread per running subtask, which is responsible for monitoring the subtask and handling its output. To be clear I would not be keen on making gnu parallel (or any other binary less common than bash itself) a dependency of poethepoet, and implementing such an integration mechanism would probably be a bit complex to get right. Any other ideas? |
On second thought, yeah: you want to see the errors quickly. I was thinking of those multi-line errors/warnings like those from pip… so maybe buffer the lines a bit until, say, 0.2 seconds has passed and no more new lines has been seen so far from process X? |
I've actually implemented that suggested solution here using asyncio.subprocess module. It just outputs stdout from commands to sys.stdout, stderr to stderr. |
We could take some inspiration from https://github.com/open-cli-tools/concurrently#readme |
+1 interest on implementing this |
I think this is an important feature, but it's currently not near the top of my list. If someone wants to submit a PoC for one or more of the strategies discussed above then that would help move it along :) Strategy 1 using |
@nat-n what is currently at the top of your list? Maybe some of us could help on those. |
+1 for this request |
I think it would be easier to run it in threads, since the current code is not asynchronous (maybe for version 0.3 it could be rewritten asynchronously?). |
Is it possible to create a task that executes multiple tasks in parallel?
I know I can create a compond task as
test = ["mypy", "pylint", "pytest"]
Calling
poe test
will run each task in sequence one after the other. It would be nice to be able to configure these task lists as safe to start in parallel.Parallel should of course not be parallel by default since some tasks requires output from previous tasks (coverage being the prime example that needs a completed test run to before generating a coverage report).
The text was updated successfully, but these errors were encountered: