-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Falling behind in health check. JetStream is not current with the meta leader. [v2.10.21] #5976
Comments
How many JS API operations are you doing per second? You can get a sense of that from |
@derekcollison |
ok thanks, that would not cause those repeated messages that it can't catch up to the meta leader. Something else is going on that is not obvious to us. |
@derekcollison We also have the same configuration of the NATS cluster with the same applications in another region with a lower load. |
Let's see if @wallyqs could spend some time on a zoom call with you. |
Hi @a-alinichenko ping me at |
@a-alinichenko I wonder if this could be related to the readinessprobe as well, so instead of the current default in the helm charts could change it into this:
|
@wallyqs Thank you for your answer! |
@a-alinichenko in the v2.11 version we have changed the readiness probe to not be as sensitive to changes and avoid the errors that you posted, but for v2.10 what I shared would work better to avoid the k8s service detaching when there is a lot of activity. |
@wallyqs, thanks for the clarification! |
Observed behavior
Each NATS cluster restart triggers this problem.
When we restart NATS cluster we have the next logs and the pod doesn't work:
At this time consumer does not read messages and they are collected on a disk.
Expected behavior
Restarting the pod does not lead to operational problems.
Server and client version
Nats server version is 2.10.21
Golang import version is http://github.com/nats-io/nats.go v1.34.1
Host environment
Filter subject: 1
Stream info example:
Installed via official helm chart in k8s.
7 pods in the cluster.
7 streams (1 for each pod) placed by tags to different pods.
Allocated resources for each pod:
CPU - 4 cores
Mem - 15 GiB
Current load:
6000-7000 messages / seconds.
The Prometheus query to count this:
Steps to reproduce
Just restart a pod under the high load
The text was updated successfully, but these errors were encountered: