Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to not advertise load-balancer IP over BGP if all backends are down when healthcheck enabled #1315

Open
6 tasks
trunet opened this issue Oct 17, 2024 · 0 comments
Labels
Documentation Documentation needs updating Feature New feature, not a bug
Milestone

Comments

@trunet
Copy link
Contributor

trunet commented Oct 17, 2024

Required information

  • Distribution: Ubuntu
  • Distribution version: 22.04
  • The output of "incus info" or if that fails:
    • Kernel version: 6.8.0-47-generic
    • LXC version: 6.0.2
    • Incus version: 6.6
    • Storage backend in use: ceph

Issue description

This is a feature request to improve OVN LB healthchecks introduced by #1020 and #1224.

We have a couple of datacenters interconnected, each DC have an incus cluster with a couple of OVN networks and BGP.

Our use case is an anycast IP assigned to the LB on each incus cluster. Eg. 2 x DCs, 10.0.0.10/32 (same configuration apart of the backend target_address):

description: regional-lb LB
config:
  healthcheck: "true"
backends:
- name: regional-lb-01-443
  description: Backend for regional-lb-01-443
  target_port: "443"
  target_address: 10.10.10.226 # in the other DC is 10.11.10.226
- name: regional-lb-01-80
  description: Backend for regional-lb-01-80
  target_port: "80"
  target_address: 10.10.10.226 # in the other DC is 10.11.10.226
- name: regional-lb-02-443
  description: Backend for regional-lb-02-443
  target_port: "443"
  target_address: 10.10.10.227 # in the other DC is 10.11.10.227
- name: regional-lb-02-80
  description: Backend for regional-lb-02-80
  target_port: "80"
  target_address: 10.10.10.227 # in the other DC is 10.11.10.227
ports:
- description: Port 443/tcp
  protocol: tcp
  listen_port: "443"
  target_backend:
  - regional-lb-01-443
  - regional-lb-02-443
- description: Port 80/tcp
  protocol: tcp
  listen_port: "80"
  target_backend:
  - regional-lb-01-80
  - regional-lb-02-80
listen_address: 10.0.0.10
location: ""

Query LB backend state (I obviously, on one side stopped my 80/443 LISTENing service):

{
        "backend_health": {
                "regional-lb-01-443": {
                        "address": "10.10.10.226",
                        "ports": [
                                {
                                        "port": 443,
                                        "protocol": "tcp",
                                        "status": "offline"
                                }
                        ]
                },
                "regional-lb-01-80": {
                        "address": "10.10.10.226",
                        "ports": [
                                {
                                        "port": 80,
                                        "protocol": "tcp",
                                        "status": "offline"
                                }
                        ]
                },
                "regional-lb-02-443": {
                        "address": "10.10.10.227",
                        "ports": [
                                {
                                        "port": 443,
                                        "protocol": "tcp",
                                        "status": "offline"
                                }
                        ]
                },
                "regional-lb-02-80": {
                        "address": "10.10.10.227",
                        "ports": [
                                {
                                        "port": 80,
                                        "protocol": "tcp",
                                        "status": "offline"
                                }
                        ]
                }
        }
}

The feature is to stop advertising the LB listen_address 10.0.0.10 through BGP when any backend have status offline.

Some thoughts:

  • behaviour configuration. remove when all backends are offline or remove when any backend is offline.
  • an interval, timeout, failure_count, success_count configuration to query the LB backend healthcheck state to decide when to remove/add it back
  • it could use by default the same as the healthcheck: healthcheck.interval, healthcheck.timeout, healthcheck.failure_count, healthcheck.success_count
  • I understand the logic is within OVN, but the BGP is controlled in incus, therefore the logic needs to be copied over to incus

Information to attach

  • Any relevant kernel output (dmesg)
  • Container log (incus info NAME --show-log)
  • Container configuration (incus config show NAME --expanded)
  • Main daemon log (at /var/log/incus/incusd.log)
  • Output of the client with --debug
  • Output of the daemon with --debug (alternatively output of incus monitor --pretty while reproducing the issue)
@stgraber stgraber added Documentation Documentation needs updating Feature New feature, not a bug labels Oct 18, 2024
@stgraber stgraber modified the milestones: soon, incus-6.7 Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Documentation needs updating Feature New feature, not a bug
Development

No branches or pull requests

2 participants