Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.10.20+ Loop detected for leafnode account #6037

Open
kukumber opened this issue Oct 24, 2024 · 3 comments
Open

2.10.20+ Loop detected for leafnode account #6037

kukumber opened this issue Oct 24, 2024 · 3 comments
Labels
defect Suspected defect such as a bug or regression

Comments

@kukumber
Copy link

Observed behavior

Nats 2.10.20+ Leafnode Error 'Loop detected for leafnode account="$G". Delaying attempt to reconnect for 30s'

Expected behavior

Everything works as it did with version 2.10.7, without "Loop detected for leafnode account" errors.

Server and client version

2.10.7-alpine works correctly
Issue reproduces with 2.10.20-alpine, 2.10.22-alpine

Host environment

No response

Steps to reproduce

Three-node jetstream cluster, each with the following configuration (with server_name and advertise being different for each node). Each NATS container runs on the different virtual host.

listen: 0.0.0.0:4222
server_name: "server1"

jetstream {
  store_dir: "/data"
  domain: main_test
}

cluster {
  listen: 0.0.0.0:6222
  advertise: server1:6222
  name: test
  routes: [
    nats://server1:6222
    nats://server2:6222
    nats://server3:6222
  ]
  tls: {
    cert_file: "/etc/pki/host_cert.pem"
    key_file: "/etc/pki/host_key.pem"
    ca_file: "/etc/pki/ca.crt"
  }
}

leafnodes {
  remotes: [
    {
      tls: {
        cert_file: "/etc/pki/host_cert.pem"
        key_file: "/etc/pki/host_key.pem"
        ca_file: "/etc/pki/ca.crt"
      }
      urls: [
        "nats-leaf://fe1-t:7444"
      ]
    },
    {
      tls: {
        cert_file: "/etc/pki/host_cert.pem"
        key_file: "/etc/pki/host_key.pem"
        ca_file: "/etc/pki/ca.crt"
      }
      urls: [
        "nats-leaf://fe2-t:7444"
      ]
    }
  ]
}

Two remote servers with the following configuration (with server_name, domain, and advertise being different for each):

listen: 0.0.0.0:4222
server_name: "fe1-t"

jetstream {
  store_dir: "/data"
  domain: fe1-t
}

leafnodes {
  port: 7444
  advertise: fe1-t:7444
  tls: {
    cert_file: "/etc/pki/host_cert.pem"
    key_file: "/etc/pki/host_key.pem"
    verify: true
    ca_file: "/etc/pki/ca.crt"
  }
}

Nodes fe1-t and fe2-t are not aware of each other.

With NATS version 2.10.7, I set up stream mirrors, and my clients can connect to the fe1-t and fe2-t hosts without issues. However, when I try to upgrade NATS on all the nodes to version 2.10.20+, I start receiving "Loop detected" errors on both the fe*-t and cluster nodes.

fe*-t errors:

[1] 2024/10/24 10:41:03.592127 [INF] 192.168.1.6:35636 - lid:170 - JetStream using domains: local "fe2-t", remote "main_test"
[1] 2024/10/24 10:41:03.620127 [INF] 192.168.1.5:58126 - lid:171 - Leafnode connection created
[1] 2024/10/24 10:41:03.642069 [INF] 192.168.1.7:33452 - lid:172 - Leafnode connection created
[1] 2024/10/24 10:41:03.658231 [INF] 192.168.1.5:58126 - lid:171 - JetStream using domains: local "fe2-t", remote "main_test"
[1] 2024/10/24 10:41:03.666950 [ERR] 192.168.1.6:35636 - lid:170 - Loop detected for leafnode account="$G". Delaying attempt to reconnect for 30s
[1] 2024/10/24 10:41:03.667095 [INF] 192.168.1.6:35636 - lid:170 - Leafnode connection closed: Protocol Violation - Account: $G
[1] 2024/10/24 10:41:03.676195 [INF] 192.168.1.7:33452 - lid:172 - JetStream using domains: local "fe2-t", remote "main_test"
[1] 2024/10/24 10:41:03.685937 [ERR] 192.168.1.5:58126 - lid:171 - Loop detected for leafnode account="$G". Delaying attempt to reconnect for 30s
[1] 2024/10/24 10:41:03.685962 [INF] 192.168.1.5:58126 - lid:171 - Leafnode connection closed: Protocol Violation - Account: $G
[1] 2024/10/24 10:41:34.678225 [INF] 192.168.1.6:56298 - lid:174 - Leafnode connection created
[1] 2024/10/24 10:41:34.688166 [INF] 192.168.1.6:56298 - lid:174 - JetStream using domains: local "fe2-t", remote "main_test"
[1] 2024/10/24 10:41:34.692347 [INF] 192.168.1.5:40116 - lid:175 - Leafnode connection created
[1] 2024/10/24 10:41:34.703173 [INF] 192.168.1.5:40116 - lid:175 - JetStream using domains: local "fe2-t", remote "main_test"
[1] 2024/10/24 10:41:34.709202 [ERR] 192.168.1.7:33452 - lid:172 - Loop detected for leafnode account="$G". Delaying attempt to reconnect for 30s
[1] 2024/10/24 10:41:34.709224 [INF] 192.168.1.7:33452 - lid:172 - Leafnode connection closed: Protocol Violation - Account: $G

Cluster errors:

[1] 2024/10/24 10:41:34.678934 [INF] 192.168.2.6:7444 - lid:549 - Leafnode connection created for account: $G 
[1] 2024/10/24 10:41:34.690719 [INF] 192.168.2.6:7444 - lid:549 - JetStream using domains: local "main_test", remote "fe2-t"
[1] 2024/10/24 10:42:01.362957 [INF] 192.168.2.6:7444 - lid:549 - Leafnode connection closed: Client Closed - Account: $G
[1] 2024/10/24 10:42:03.367913 [ERR] Error trying to connect as leafnode to remote server "fe2-t:7444" (attempt 1): dial tcp 192.168.2.6:7444: i/o timeout
[1] 2024/10/24 10:42:04.681663 [INF] 192.168.2.5:7444 - lid:587 - Leafnode connection created for account: $G 
[1] 2024/10/24 10:42:04.704360 [INF] 192.168.2.5:7444 - lid:587 - JetStream using domains: local "main_test", remote "fe1-t"
[1] 2024/10/24 10:42:04.792195 [INF] 192.168.2.6:7444 - lid:588 - Leafnode connection created for account: $G 
[1] 2024/10/24 10:42:04.932000 [INF] 192.168.2.6:7444 - lid:588 - JetStream using domains: local "main_test", remote "fe2-t"
[1] 2024/10/24 10:42:04.947568 [ERR] 192.168.2.6:7444 - lid:588 - Leafnode Error 'Loop detected for leafnode account="$G". Delaying attempt to reconnect for 30s'
[1] 2024/10/24 10:42:04.947601 [ERR] 192.168.2.6:7444 - lid:588 - Loop detected for leafnode account="$G". Delaying attempt to reconnect for 30s
[1] 2024/10/24 10:42:04.947610 [INF] 192.168.2.6:7444 - lid:588 - Leafnode connection closed: Protocol Violation - Account: $G
@kukumber kukumber added the defect Suspected defect such as a bug or regression label Oct 24, 2024
@neilalexander
Copy link
Member

Is this still happening if you set no_advertise on fe*-t leafnode configuration?

@kukumber
Copy link
Author

@neilalexander unfortunately no_advertise didn't help

@kukumber
Copy link
Author

The issue does not reproduce when all the containers run on the same host

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

No branches or pull requests

2 participants