Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate reconciliation of tasks - accepting/declining/reviving resources offers #214

Closed
floriangrundig opened this issue Aug 12, 2015 · 1 comment

Comments

@floriangrundig
Copy link

I had a quick look at the current ES code and I’m wondering whether your task reconciliation works properly. The reason is that I haven’t found any schedulerDriver.reviveOffers() call in your code which might be a problem once you declined an offer because your already running an ES executor on that node.

We had that problem in our logstash project - it seems that once you decline an offer (or even worse do neither accept or decline) you will not receive that offer again unless you revive the offers!
A scenario to test it would be:

  1. Run on all nodes an ES executor (normal ES framework start)
  2. Kill the scheduler so that the executor are still running (the executor will usually live because of failover)
  3. Kill now an executor on a slave (via docker stop ..)
  4. start the scheduler a see whether that slave (see 3.) get an executor

Because you’re using a slightly different approach to reconcile tasks (executor heartbeats) or how you run the executors it might be not a problem for you - but I would check that...

@philwinder
Copy link
Contributor

This is not an issue for us. We are using a custom healthcheck ping to make sure that executors are alive. Hence, when the scheduler restarts, the executor will not ping back and will then request more offers.
However, there is a task to see whether using reconcileTasks would be a better method that a custom ping: #209

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants