-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: improve farms maintenance performance via parallelization #3354
base: main
Are you sure you want to change the base?
opt: improve farms maintenance performance via parallelization #3354
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code looks reasonable, but does it actually improve performance in practice?
If two farms are on the same disk (or otherwise share resources, like a bus or cable), writing pieces at the same time will likely slow them down.
Also, lock contention on shared data structures could slow every farm down.
There’s also a slight chance running more code in parallel will deadlock, which would be a bug that needs to be fixed in this PR.
I’m not familiar with this code in detail, and why it was designed that way. So I’ll leave it to Nazar to work out how significant these risks are.
@teor2345 Thank you for your review. here is my understanding; please point it out if I'm wrong.
This is just the place for managing the cluster farms. During initialization, it only reads, and there is no concurrent write happening on the farms here. (In practice, it is also rare for two farms to be plotted on the same disk, as it doesn't make much sense.)
Yes, there will indeed be lock contention (this can be optimized by improving lock usage to avoid it; let me optimize this part of the implementation). However, in my understanding, the focus here is still on network I/O, and the time spent on locks is negligible compared to the network I/O.
Yes, there is a better implementation for this. Let me optimize it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unfortunately not an equivalent change, you simply ignored the explanation provided in comments and violated expectations described in it. I should have provided a more elaborate explanation as to why it was needed, but it wasn't an accident.
On the networking level it is not guaranteed that farm shutdown will happen strictly after farm initialization has completed on the controller, so you may end up in a situation where farm is never removed. That is the reason of sequential processing, to make sure farm is removed strictly after it is initialized and never before that. As briefly mentioned in #3309, it doesn't actually need to be globally sequential and parallelizing across independent farms will in fact improve performance, but everything related to specific farm still needs to be sequential. I think this can be implemented with StreamMap, which we already use in other places for similar purposes.
There's no need for sequential execution here; parallelizing it will improve performance.
It's even simpler than I imagined—just placing the tasks in a FuturesUnordered for background execution, with no changes to the other logic.
in #3309 , I believe what’s needed is another background task queue to collect the stream reques, which doesn’t conflict with this PR, so I’ve submitted it directly.
Code contributor checklist: