You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a bit of an observation / suggestion rather than a bug.
I have a Galera WAN cluster, to provide high availability between a 3 distinct data centers / availability zones. One of the data centers lost IPv6 connectivity to the other two, but still had functioning IPv4 connectivity.
I noticed that Galera kept repeatedly attempting to connect to the IPv6 address until it timed out and gave up.
2024-04-19 5:38:53 0 [Note] WSREP: (19a9c908-9f93, 'ssl://[::]:4567') connection to peer 00000000-0000 with addr ssl://[beef:beef:beef:beef::1]:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 2327670644 cwnd: 1 last_queued_since: 6622946029112785 last_delivered_since: 6622946029112785 send_queue_length: 0 send_queue_bytes: 0
2024-04-19 5:38:53 0 [Note] WSREP: (19a9c908-9f93, 'ssl://[::]:4567') connection to peer 00000000-0000 with addr ssl://[cafe:cafe:cafe:cafe::1]:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 2327670644 cwnd: 1 last_queued_since: 6622946029243485 last_delivered_since: 6622946029243485 send_queue_length: 0 send_queue_bytes: 0
2024-04-19 5:38:57 0 [Note] WSREP: (19a9c908-9f93, 'ssl://[::]:4567') connection to peer 00000000-0000 with addr ssl://[beef:beef:beef:beef::1]:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 2327674644 cwnd: 1 last_queued_since: 6622950029589363 last_delivered_since: 6622950029589363 send_queue_length: 0 send_queue_bytes: 0
2024-04-19 5:39:00 0 [Note] WSREP: (19a9c908-9f93, 'ssl://[::]:4567') connection to peer 00000000-0000 with addr ssl://[cafe:cafe:cafe:cafe::1]:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 2327677644 cwnd: 1 last_queued_since: 6622953029963392 last_delivered_since: 6622953029963392 send_queue_length: 0 send_queue_bytes: 0
2024-04-19 5:39:03 0 [Note] WSREP: (19a9c908-9f93, 'ssl://[::]:4567') connection to peer 00000000-0000 with addr ssl://[beef:beef:beef:beef::1]:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 2327681154 cwnd: 1 last_queued_since: 6622956530264078 last_delivered_since: 6622956530264078 send_queue_length: 0 send_queue_bytes: 0
2024-04-19 5:39:06 0 [Note] WSREP: (19a9c908-9f93, 'ssl://[::]:4567') connection to peer 00000000-0000 with addr ssl://[cafe:cafe:cafe:cafe::1]:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 2327684154 cwnd: 1 last_queued_since: 6622959530569978 last_delivered_since: 6622959530569978 send_queue_length: 0 send_queue_bytes: 0
2024-04-19 5:39:09 0 [Note] WSREP: (19a9c908-9f93, 'ssl://[::]:4567') connection to peer 00000000-0000 with addr ssl://[beef:beef:beef:beef::1]:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 2327687154 cwnd: 1 last_queued_since: 6622962530838950 last_delivered_since: 6622962530838950 send_queue_length: 0 send_queue_bytes: 0
2024-04-19 5:39:13 0 [Note] WSREP: (19a9c908-9f93, 'ssl://[::]:4567') connection to peer 00000000-0000 with addr ssl://[cafe:cafe:cafe:cafe::1]:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 2327690654 cwnd: 1 last_queued_since: 6622966031075340 last_delivered_since: 6622966031075340 send_queue_length: 0 send_queue_bytes: 0
2024-04-19 5:39:16 0 [Note] WSREP: (19a9c908-9f93, 'ssl://[::]:4567') connection to peer 00000000-0000 with addr ssl://[beef:beef:beef:beef::1]:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 2327693654 cwnd: 1 last_queued_since: 6622969031345969 last_delivered_since: 6622969031345969 send_queue_length: 0 send_queue_bytes: 0
2024-04-19 5:39:19 0 [Note] WSREP: (19a9c908-9f93, 'ssl://[::]:4567') connection to peer 00000000-0000 with addr ssl://[cafe:cafe:cafe:cafe::1]:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 2327697154 cwnd: 1 last_queued_since: 6622972531609620 last_delivered_since: 6622972531609620 send_queue_length: 0 send_queue_bytes: 0
2024-04-19 5:39:22 0 [Note] WSREP: (19a9c908-9f93, 'ssl://[::]:4567') connection to peer 00000000-0000 with addr ssl://[beef:beef:beef:beef::1]:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 2327700154 cwnd: 1 last_queued_since: 6622975531888988 last_delivered_since: 6622975531888988 send_queue_length: 0 send_queue_bytes: 0
I think it would be a nice improvement to try both the IPv4 and IPv6 IPs when both are available. It would help reconnect in a partial outage scenario like I observed.
It would also be interesting if Galera could maintain multiple open connections between two nodes, but that may be annoying from a code perspective.
Perhaps the simplest solution is to just Multipath TCP which could still present Galera with an identical input/output experience, while handling connection redundancy at the protocol level.
The text was updated successfully, but these errors were encountered:
This is a bit of an observation / suggestion rather than a bug.
I have a Galera WAN cluster, to provide high availability between a 3 distinct data centers / availability zones. One of the data centers lost IPv6 connectivity to the other two, but still had functioning IPv4 connectivity.
I noticed that Galera kept repeatedly attempting to connect to the IPv6 address until it timed out and gave up.
I observed this using:
wsrep_culster_address="gcomm://hostname_with_ipv4_and_ipv6,another_host_with_ipv4_and_ipv6"
I think it would be a nice improvement to try both the IPv4 and IPv6 IPs when both are available. It would help reconnect in a partial outage scenario like I observed.
It would also be interesting if Galera could maintain multiple open connections between two nodes, but that may be annoying from a code perspective.
Perhaps the simplest solution is to just Multipath TCP which could still present Galera with an identical input/output experience, while handling connection redundancy at the protocol level.
The text was updated successfully, but these errors were encountered: