-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NATS Cluster unstable in Redhat OpenShift with Alpine Image but behaving stable in VM [v2.10.19] #5881
Comments
one of the main problems is going to be that routes are defined using the service name instead of the A records available for the StatefulSet
above should be instead something like:
|
Also deploying JetStream on NFS volumes is not recommended. |
Thank you for your response. NFS volumes was replaced with container storage. Routes was defined from helm chart (bitnami/nats.io). We can enable additional routes as mentioned above. Consumer not able to connect to NATS completely, I assume this is not related to routes. Frequent responders not available error is also not related to routes. Can you please support us to resolve those errors ? |
Routes is how nodes communicate with each other. No cluster comms no working streams and consumers. Fix those first. |
I have attached my statefulset.yaml file with this. This is just for testing purpose in my local openshift cluster. We were using official helm chart from bitnami and nats.io for 2.10.11 and 2.10.18. Can you please review and let me know if any correction should be made to this in addition to routes? That would be helpful. |
@mohamedsaleem18 the statefulset has the issue with the cluster routes not being explicit, I think this is a setting from the bitnami chart is wrong :/ the one we maintain at nats-io/k8s would include the right routes. If you do not set the routes as I mentioned, on restarts there will be partitions and the cluster will not work well. |
I have setup the cluster with below server configuration (routes as recommended) and started testing. I will update you. Thank you for your support. accounts: { Clustering definitioncluster { Authorization for cluster connectionsRoutes are actively solicited and connected to from this server.Other servers can connect to us if they supply the correct credentialsin their routes definitions from aboveroutes = [ JetStream configurationjetstream: { |
I ran bench test deploying 3 nodes cluster (container storage) in Redhat openshift cluster. Encountered the same issue after configuring the routes as per the recommendation. Subscribers struck without pulling the message. Ran the test in my laptop (so no network issue). Please refer the screenshot below and NATS server logs attached. nats-21019-2-nats-21019.log nats -s nats://127.0.0.1:31422,nats://127.0.0.1:31421,nats://127.0.0.1:31420 --user platformadmin --password xxxx bench bar --js --pub 10 --sub 20 --size 16 --replicas 3 --msgs 100000 --pull --purge --pubsleep 2s |
Observed behavior
Issue Description:
We are experiencing instability with NATS 2.10.19 (Jetstream enabled) cluster when deployed in a Redhat OpenShift environment, even though the same setup (using 2.10.19) works normally in a VM environment. Multitenancy is enabled in both the VM and OpenShift environments.
Issues observed in OpenShift (NATS cluster):
OpenShift details:
• NATS Image: nats:2.10.19-alpine
• NATS Cluster Size: 3 replicas
• Storage Type: Container storage (i.e., data inside container)
• Deployment Type: StatefulSet
• Redhat Openshift cluster version: Server Version: 4.16.7 (Local installation with one VM). Note:- NATS versions 2.10.11 and 2.10.18 was tested with Redhat OpenShift Server Version: 4.14.20 with multiple VM nodes.
NATS server configuration:
VM details:
NATS version: 2.10.19
NATS Cluster size: 3
Storage type: Local filesystem storage on VM.
VM Operating System: Oracle Linux Server 8.10
VM CPE OS Name: cpe:/o:oracle:linux:8:10:server
VM Kernel: Linux 5.4.17-2136.334.6.1.el8uek.x86_64
VM Architecture: x86-64
NATS server config:
Expected behavior
NATS 2.10.19 should be stable in Redhat openshift environment.
Server and client version
nats cli version : 0.1.5
nats-server: v2.10.19
Host environment
Please refer Issue description section for complete details.
Steps to reproduce
The text was updated successfully, but these errors were encountered: