Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node-Rotation failure can leave cluster with shard-rebalancing disabled #41

Open
rtyley opened this issue Jul 11, 2019 · 1 comment
Open

Comments

@rtyley
Copy link
Member

rtyley commented Jul 11, 2019

We recently had node-rotation fail due to #36, which is unfortunate in itself, but even worse, this was about 20 minutes after we began having a separate outage within Ophan, and were distracted by that.

We decided to try scaling up our Elasticsearch cluster, in an attempt to better handle the load, but got no benefit from scaling up, as elasticsearch-node-rotation had left our cluster with this cluster setting when it died:

"cluster.routing.rebalance.enable": "none"

...the team lost a fair bit of time attempting to understand why the new boxes weren't taking any of the load.

.then(() => updateRebalancingStatus(oldInstance.id, "all"))

@jacobwinch
Copy link
Contributor

jacobwinch commented Jul 11, 2019

Ouch - that's a nasty one, thanks for reporting this.

One solution to this would be to define a failure handler lambda (as a Catch) which re-enables this setting if the execution fails part way through.

Obviously fixing #36 would help too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants