Replies: 1 comment
-
Etcd cluster members join the cluster as learners, and there can only be a single learner in the cluster at a time. For this reason, server joins must be serialized. If you want to check using kubectl, you could look for the etcd pod on the newly joined node to be ready. You could also use etcdctl to query the cluster membership, although that's a good bit more work. Or you could just start them all at once and wait for things to eventually come up. I've not seen it ever fail entirely, if you just allow systemd to restart the service it will get there eventually. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Environmental Info:
RKE2 Version:
v1.26.0+rke2r2
Node(s) CPU architecture, OS, and Version:
RHEL 8.5 x86_64
Cluster Configuration:
3 Master Nodes in HA configuration. No Worker Nodes.
Describe the bug:
I start up Master Node A, which starts with no problem. However, when I try to join 2 other Master Nodes (B/C) following these instructions (https://docs.rke2.io/install/ha) it seems I have to wait some amount of time (between 1-5 mins through observation) before I can start them. I did not see any reference to waiting conditions in the docs.
If I try and start Master Node B too early (within 1-2 min of starting Master A) , it seems like
etcd
has issues starting. Starting Master Node B viasystemctl start rke2-server
hangs for about 20 mins before it exits with an error. However starting Master C then succeeds, at which point Master B then joins successfully after Master C.I tried to add a conditional wait after starting Master A where I wait for the Master A Node Status to be Ready=True from the k8s api and then add the Master B/C nodes. However this did not work and I'm having to add an additional 5m pause after the conditional Ready=True wait.
Steps To Reproduce:
Expected behavior:
If a wait is required before Master B/C nodes can join once Master A starts, I would expect to have some sort of conditional signal that lets me know when Master A is ready to accept new Master nodes joining in HA configuration.
Actual behavior:
I'm having to place an arbitrary wait clause (5 mins) after starting Master A which is prone to errors (what if one day it's running slower and 5m is not enough?) I'd like to understand what conditional I can check to ensure other Master nodes can join rather than just waiting 5m.
Additional context / logs:
Not sure what logs would be helpful here (kubelet / etcd / ?), but happy to provide if this is unexpected behavior.
Beta Was this translation helpful? Give feedback.
All reactions