Quorum queue becomes unavailable, replica status is reported as noproc and there are no metrics #11564
-
Describe the bugA quorum queue become unavailable (after it was purged manually, not able to say if it's the root cause but it's a fact). RabbitMQ version is 3.12.13-1 State of the queue during issue:
See below the related logs (which is similar on 2 nodes): Warning :
Error :
Reproduction stepsno steps Expected behaviorno crash :) Additional contextNo response |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
RabbitMQ 3.12.x is out of community support. The log snippets shared simply state that a quorum queue leader process is down, and according to |
Beta Was this translation helpful? Give feedback.
-
There is a well known upgrade strategy that can lead to exactly this scenario. It is documented and explicitly recommended against. Whether something like this was used during an upgrade or because some VMs or containers that were running the nodes were replaced, I cannot know. But you cannot replace a majority of nodes without explicitly removing replicas and adding new ones. The above doc section goes into more detail on what specifically is meant by that. |
Beta Was this translation helpful? Give feedback.
-
It was not during upgrade. It is just a bug in normal condition, under normal workload. We have more than one thousand queues. And this one gets down after purging. |
Beta Was this translation helpful? Give feedback.
RabbitMQ 3.12.x is out of community support.
The log snippets shared simply state that a quorum queue leader process is down, and according to
list_queues
, all replicas are. To form a decent hypothesis, see the events that precede this exception. The exception itself tells you nothing about the root cause.