Add option to not advertise load-balancer IP over BGP if all backends are down when healthcheck enabled #1315

trunet · 2024-10-17T13:41:33Z

Required information

Distribution: Ubuntu
Distribution version: 22.04
The output of "incus info" or if that fails:
- Kernel version: 6.8.0-47-generic
- LXC version: 6.0.2
- Incus version: 6.6
- Storage backend in use: ceph

Issue description

This is a feature request to improve OVN LB healthchecks introduced by #1020 and #1224.

We have a couple of datacenters interconnected, each DC have an incus cluster with a couple of OVN networks and BGP.

Our use case is an anycast IP assigned to the LB on each incus cluster. Eg. 2 x DCs, 10.0.0.10/32 (same configuration apart of the backend target_address):

description: regional-lb LB
config:
  healthcheck: "true"
backends:
- name: regional-lb-01-443
  description: Backend for regional-lb-01-443
  target_port: "443"
  target_address: 10.10.10.226 # in the other DC is 10.11.10.226
- name: regional-lb-01-80
  description: Backend for regional-lb-01-80
  target_port: "80"
  target_address: 10.10.10.226 # in the other DC is 10.11.10.226
- name: regional-lb-02-443
  description: Backend for regional-lb-02-443
  target_port: "443"
  target_address: 10.10.10.227 # in the other DC is 10.11.10.227
- name: regional-lb-02-80
  description: Backend for regional-lb-02-80
  target_port: "80"
  target_address: 10.10.10.227 # in the other DC is 10.11.10.227
ports:
- description: Port 443/tcp
  protocol: tcp
  listen_port: "443"
  target_backend:
  - regional-lb-01-443
  - regional-lb-02-443
- description: Port 80/tcp
  protocol: tcp
  listen_port: "80"
  target_backend:
  - regional-lb-01-80
  - regional-lb-02-80
listen_address: 10.0.0.10
location: ""

Query LB backend state (I obviously, on one side stopped my 80/443 LISTENing service):

{
        "backend_health": {
                "regional-lb-01-443": {
                        "address": "10.10.10.226",
                        "ports": [
                                {
                                        "port": 443,
                                        "protocol": "tcp",
                                        "status": "offline"
                                }
                        ]
                },
                "regional-lb-01-80": {
                        "address": "10.10.10.226",
                        "ports": [
                                {
                                        "port": 80,
                                        "protocol": "tcp",
                                        "status": "offline"
                                }
                        ]
                },
                "regional-lb-02-443": {
                        "address": "10.10.10.227",
                        "ports": [
                                {
                                        "port": 443,
                                        "protocol": "tcp",
                                        "status": "offline"
                                }
                        ]
                },
                "regional-lb-02-80": {
                        "address": "10.10.10.227",
                        "ports": [
                                {
                                        "port": 80,
                                        "protocol": "tcp",
                                        "status": "offline"
                                }
                        ]
                }
        }
}

The feature is to stop advertising the LB listen_address 10.0.0.10 through BGP when any backend have status offline.

Some thoughts:

behaviour configuration. remove when all backends are offline or remove when any backend is offline.
an interval, timeout, failure_count, success_count configuration to query the LB backend healthcheck state to decide when to remove/add it back
it could use by default the same as the healthcheck: healthcheck.interval, healthcheck.timeout, healthcheck.failure_count, healthcheck.success_count
I understand the logic is within OVN, but the BGP is controlled in incus, therefore the logic needs to be copied over to incus

Information to attach

Any relevant kernel output (dmesg)
Container log (incus info NAME --show-log)
Container configuration (incus config show NAME --expanded)
Main daemon log (at /var/log/incus/incusd.log)
Output of the client with --debug
Output of the daemon with --debug (alternatively output of incus monitor --pretty while reproducing the issue)

The text was updated successfully, but these errors were encountered:

stgraber added Documentation Documentation needs updating Feature New feature, not a bug labels Oct 18, 2024

stgraber modified the milestones: soon, incus-6.7 Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to not advertise load-balancer IP over BGP if all backends are down when healthcheck enabled #1315

Add option to not advertise load-balancer IP over BGP if all backends are down when healthcheck enabled #1315

trunet commented Oct 17, 2024 •

edited

Loading

Add option to not advertise load-balancer IP over BGP if all backends are down when healthcheck enabled #1315

Add option to not advertise load-balancer IP over BGP if all backends are down when healthcheck enabled #1315

Comments

trunet commented Oct 17, 2024 • edited Loading

Required information

Issue description

Information to attach

trunet commented Oct 17, 2024 •

edited

Loading