You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had a quick look at the current ES code and I’m wondering whether your task reconciliation works properly. The reason is that I haven’t found any schedulerDriver.reviveOffers() call in your code which might be a problem once you declined an offer because your already running an ES executor on that node.
We had that problem in our logstash project - it seems that once you decline an offer (or even worse do neither accept or decline) you will not receive that offer again unless you revive the offers!
A scenario to test it would be:
Run on all nodes an ES executor (normal ES framework start)
Kill the scheduler so that the executor are still running (the executor will usually live because of failover)
Kill now an executor on a slave (via docker stop ..)
start the scheduler a see whether that slave (see 3.) get an executor
Because you’re using a slightly different approach to reconcile tasks (executor heartbeats) or how you run the executors it might be not a problem for you - but I would check that...
The text was updated successfully, but these errors were encountered:
This is not an issue for us. We are using a custom healthcheck ping to make sure that executors are alive. Hence, when the scheduler restarts, the executor will not ping back and will then request more offers.
However, there is a task to see whether using reconcileTasks would be a better method that a custom ping: #209
I had a quick look at the current ES code and I’m wondering whether your task reconciliation works properly. The reason is that I haven’t found any schedulerDriver.reviveOffers() call in your code which might be a problem once you declined an offer because your already running an ES executor on that node.
We had that problem in our logstash project - it seems that once you decline an offer (or even worse do neither accept or decline) you will not receive that offer again unless you revive the offers!
A scenario to test it would be:
Because you’re using a slightly different approach to reconcile tasks (executor heartbeats) or how you run the executors it might be not a problem for you - but I would check that...
The text was updated successfully, but these errors were encountered: