Skip to content

Khepri: Raft leader election times vary depending on failure scenarios #12528

Answered by kjnilsson
Rmarian asked this question in Other
Discussion options

You must be logged in to vote

@Rmarian what is most likely happening here is that however you are force terminating your node leaves a dangling TCP connection on the other nodes so that we have to rely on the aten based failure detection instead of the more expedient erlang monitors.

Ra (the library khepri is based on) favours leader stability over leader election latency during network partitions. I still would have thought 12s would be around the top end of what you should experience. What drives most of this latency is the poll_interval setting in the aten application. in RabbitMQ this is set to a conservative 5s which affects quorum queues as well as khepri. lowering this by using the raft.adaptive_failure_detecto…

Replies: 2 comments 4 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
4 replies
@Rmarian
Comment options

@kjnilsson
Comment options

@kjnilsson
Comment options

@michaelklishin
Comment options

Answer selected by michaelklishin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants