Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce cluster rebalance time #55

Open
davidfurey opened this issue Nov 17, 2020 · 3 comments
Open

Reduce cluster rebalance time #55

davidfurey opened this issue Nov 17, 2020 · 3 comments
Assignees

Comments

@davidfurey
Copy link
Member

We have noticed that the Ophan cluster spends many hours after a node rotation rebalancing the cluster.

When vacating a node using node exclusion, Elasticsearch will move the shards away, but that doesn’t mean they will all go to the new node. So while the old node quickly passes its shards on to other nodes, and then is terminated, after this there is a lot of rebalancing, at a slow pace to minimize the impact cluster performance, as heuristics are being met.

Elastic have suggested that another option would be to move the shards from the old to the new node using Cluster Reroute. As all shards are moved to the new node, this should cause minimal rebalance, if any.

@davidfurey
Copy link
Member Author

@jacobwinch interested in what your thoughts are on this potential change to the node rotator.

@jacobwinch
Copy link
Contributor

@davidfurey - I agree that the extended period of rebalancing is undesirable; it was always our intention to migrate all data from the old node onto the newest node. Perhaps this happened to work OK on the smaller clusters that we were testing with, or perhaps we just never fulfilled this requirement correctly... either way I definitely think the suggestion from you/Elastic would be an improvement on the current behaviour.

@tomrf1 might have thoughts on this too? (Perhaps your memory is better than mine!)

@tomrf1
Copy link
Member

tomrf1 commented Nov 18, 2020

My memory of this is not great...
But I do remember us spending a lot of time making sure the shards go from the old node to the new node without any further re-allocation. Perhaps something has changed since?

Cluster Reroute is new to me, but that sounds like exactly what we want!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants