You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NOTE: this issue may be applicable with Mnesia as well (I haven't tried), but with Mnesia, network partition handling strategies "interfere" and potentially prevent/resolve this issue. Therefore, I focus on a khepri-enabled clusters here.
In a 3-node cluster with Khepri enabled, if the node with a Ra leader gets partitioned from the other two nodes, it may not rejoin the Ra cluster after the partition heals. It's sufficient to call net_kernel:connect_node('rabbit@another-node') or ra:members({rabbitmq_metadata, 'rabbit@<another-node>'}).) from the partitioned node, for all nodes to re-establish connections, rejoin the Ra clusters and sync the updates.
If I connect an AMQP client to a node partitioned like that, it's not able to perform any declarations, can't publish to queues etc. Attempting to do so may trigger reconnections but not quickly enough for the client to succeed (it's not exactly clear to me what triggers the reconnection - sometimes the cluster was sitting broken for 30 minutes).
Reproduction steps
Deploy a 3-node RabbitMQ cluster (I'm using main from December 13th) and enable khepri_db
Check which node is the leader for Khepri or a QQ
Trigger a network partition so that the leader can't talk to the other 2 nodes (I'm using chaos-mesh on Kubernetes)
As expected, the other two nodes elect a new leader and work correctly
Resolve the network partition after 2 minutes
Check rabbitmq-diagnostics metadata_store_status or rabbitmqctl list_queues leader (based on whether you are triggering this in a QQ Ra cluster or a Khepri Ra cluster) - you will see that the old leader still believes it is the leader and that the other two members are missing. The other two nodes report the correct leader and only miss the old leader's member. For example, after server-0 was partitioned, it reports:
Status of metadata store on node [email protected] ...
┌───────────────────────────────────────────┬────────────┬────────────┬────────────────┬──────────────┬──────────────┬──────────────┬────────────────┬──────┬─────────────────┐
│ Node Name │ Raft State │ Membership │ Last Log Index │ Last Written │ Last Applied │ Commit Index │ Snapshot Index │ Term │ Machine Version │
├───────────────────────────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ [email protected] │ leader │ voter │ 959 │ 959 │ 959 │ 959 │ -1 │ 29 │ 1 │
└───────────────────────────────────────────┴────────────┴────────────┴────────────────┴──────────────┴──────────────┴──────────────┴────────────────┴──────┴─────────────────┘
The other two nodes report the new leader correctly:
Status of metadata store on node [email protected] ...
┌───────────────────────────────────────────┬────────────┬────────────┬────────────────┬──────────────┬──────────────┬──────────────┬────────────────┬──────┬─────────────────┐
│ Node Name │ Raft State │ Membership │ Last Log Index │ Last Written │ Last Applied │ Commit Index │ Snapshot Index │ Term │ Machine Version │
├───────────────────────────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ [email protected] │ leader │ voter │ 960 │ 960 │ 960 │ 960 │ -1 │ 30 │ 1 │
├───────────────────────────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ [email protected] │ follower │ voter │ 960 │ 960 │ 960 │ 960 │ -1 │ 30 │ 1 │
└───────────────────────────────────────────┴────────────┴────────────┴────────────────┴──────────────┴──────────────┴──────────────┴────────────────┴──────┴─────────────────┘
Status of metadata store on node [email protected] ...
┌───────────────────────────────────────────┬────────────┬────────────┬────────────────┬──────────────┬──────────────┬──────────────┬────────────────┬──────┬─────────────────┐
│ Node Name │ Raft State │ Membership │ Last Log Index │ Last Written │ Last Applied │ Commit Index │ Snapshot Index │ Term │ Machine Version │
├───────────────────────────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ [email protected] │ leader │ voter │ 960 │ 960 │ 960 │ 960 │ -1 │ 30 │ 1 │
├───────────────────────────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ [email protected] │ follower │ voter │ 960 │ 960 │ 960 │ 960 │ -1 │ 30 │ 1 │
└───────────────────────────────────────────┴────────────┴────────────┴────────────────┴──────────────┴──────────────┴──────────────┴────────────────┴──────┴─────────────────┘
Expected behavior
Most likely we need a component that will periodically check whether all expected nodes are present and if not, it'll try to re-establish these connections. Otherwise, at least in a fairly idle cluster, it may not happen.
Additional context
No response
The text was updated successfully, but these errors were encountered:
Describe the bug
NOTE: this issue may be applicable with Mnesia as well (I haven't tried), but with Mnesia, network partition handling strategies "interfere" and potentially prevent/resolve this issue. Therefore, I focus on a khepri-enabled clusters here.
In a 3-node cluster with Khepri enabled, if the node with a Ra leader gets partitioned from the other two nodes, it may not rejoin the Ra cluster after the partition heals. It's sufficient to call
net_kernel:connect_node('rabbit@another-node')
orra:members({rabbitmq_metadata, 'rabbit@<another-node>'}).
) from the partitioned node, for all nodes to re-establish connections, rejoin the Ra clusters and sync the updates.If I connect an AMQP client to a node partitioned like that, it's not able to perform any declarations, can't publish to queues etc. Attempting to do so may trigger reconnections but not quickly enough for the client to succeed (it's not exactly clear to me what triggers the reconnection - sometimes the cluster was sitting broken for 30 minutes).
Reproduction steps
main
from December 13th) and enable khepri_dbrabbitmq-diagnostics metadata_store_status
orrabbitmqctl list_queues leader
(based on whether you are triggering this in a QQ Ra cluster or a Khepri Ra cluster) - you will see that the old leader still believes it is the leader and that the other two members are missing. The other two nodes report the correct leader and only miss the old leader's member. For example, after server-0 was partitioned, it reports:The other two nodes report the new leader correctly:
Expected behavior
Most likely we need a component that will periodically check whether all expected nodes are present and if not, it'll try to re-establish these connections. Otherwise, at least in a fairly idle cluster, it may not happen.
Additional context
No response
The text was updated successfully, but these errors were encountered: