help request: apisix don't sync data from etcd #11390

jujiale · 2024-07-05T02:50:20Z

Description

Hello，I suffered the following situation in our prd apisix cluster and one dev apisix node
our prd has 4 env, each has 3 apisix instance, deployed with rpm, one cluster( we all it A in here) appear a odd thing, let me describe:

we modify the cluster A config in apisix-dashboard, and we submit it, in etcd, I have found it is modify correctly, but when I use /v1/route/route_id, found that the whole config in cluster A instance is old version, and no matter how many times modify the config, the config in etcd is correactly, and the update_time is correct, but the config in instance is old, and the update time is very old, and nevery change.
for example : etcd config

`

 /test/apisix/routes/515483732765836994
  {"id":"515483732765836994","create_time":1716781847,"update_time":1720085553,"uris": 
   ["/menu.service.query/m","/menu.service.query/pm/*"],"name":"aaa","priority":10,"methods":["GET","POST","PUT","DELETE","PATCH","HEAD","OPTIONS","CONNECT","TRACE"],"host":"xxx.com","upstream_id":"515483516306196172","status":1}

when I invoke /v1/route/route_id, config like below:

{
    "key": "/test/apisix/routes/515483732765836994",
    "createdIndex": 946,
    "has_domain": false,
    "clean_handlers": {},
    "modifiedIndex": 946,
    "update_count": 0,
    "orig_modifiedIndex": 946,
    "value": {
        "priority": 10,
        "host": "xxx.com",
        "name": "aaa",
        "methods": [
            "GET",
            "POST",
            "PUT",
            "DELETE",
            "PATCH",
            "HEAD",
            "OPTIONS",
            "CONNECT",
            "TRACE"
        ],
        "id": "515483732765836994",
        "uris": [
            "/menu.service.query/m",
            "/menu.service.query/w",
            "/menu.service.query/pm/*"
        ],
        "update_time": 1716781847,
        "create_time": 1716781847,
        "status": 1,
        "upstream_id": "515483516306196172"
    }
}

`
we could see that the uris is not the same, and the update_time is not the same, but in other cluster, it works well

2.apisix log shows:
note that the error log is consistent output, seems the issue occurs all the time.

`

    172.xx.61.52, server: _, request: "POST /menu.service.query/w HTTP/1.1", host: "xxx.com"
    2024/07/04 16:00:55 [error] 16235#16235: *143253446 [lua] config_util.lua:86: failed to find clean_handler with idx 1, client: 172.xx.61.47, server: _, request: "POST /menu.service.query/w HTTP/1.1", host: "xxx.com"
    2024/07/04 16:00:55 [error] 16234#16234: *143283913 [lua] config_etcd.lua:584: failed to fetch data from etcd: /test/apisix/apisix/core/config_util.lua:104: attempt to index local 'item' (a boolean value)
    stack traceback:
      /test/apisix/apisix/core/config_util.lua:104: in function 'fire_all_clean_handlers'
      /test/apisix/apisix/core/config_etcd.lua:315: in function 'sync_data'
      /test/apisix/apisix/core/config_etcd.lua:541: in function </test/apisix/apisix/core/config_etcd.lua:532>
      [C]: in function 'xpcall'
      /test/apisix/apisix/core/config_etcd.lua:532: in function </test/apisix/apisix/core/config_etcd.lua:513>,  etcd key: /test/apisix/upstreams, context: ngx.timer
    2024/07/04 16:00:55 [error] 16235#16235: *143280176 [lua] config_util.lua:86: failed to find clean_handler with idx 1, client: 172.xx.61.47, server: _, request: "POST /menu.service.validate/w HTTP/1.1", host: "xxx.com"
    2024/07/04 16:00:55 [error] 16240#16240: *143284010 [lua] config_etcd.lua:584: failed to fetch data from etcd: /test/apisix/apisix/core/config_util.lua:104: attempt to index local 'item' (a boolean value)
    stack traceback:
      /test/apisix/apisix/core/config_util.lua:104: in function 'fire_all_clean_handlers'
      /test/apisix/apisix/core/config_etcd.lua:315: in function 'sync_data'
      /test/apisix/apisix/core/config_etcd.lua:541: in function </test/apisix/apisix/core/config_etcd.lua:532>
      [C]: in function 'xpcall'
      /test/apisix/apisix/core/config_etcd.lua:532: in function </test/apisix/apisix/core/config_etcd.lua:513>,  etcd key: /test/apisix/janus/routes, context: ngx.timer

3.capture the 2379 port in apisix instance, found:

66
{"error":{"grpc_code":1,"http_code":408,"message":"context canceled","http_status":"Request Timeout"}}
0

`
also found many request is timeout beyond 30s, as below:

I could confirm that the etcd is health, even I restart etcd, the scenario also exist. and apisix to etcd network is correct, some /v3/watch could return correctly, but apisix seems not use the config.

because we use 2.15.0 in prd env, so we could not upgrade it randomly

want to know if it is apisix bug, if it is , we plan merge some changes to solve it, and why config could not sync to apisix instance

Environment

APISIX version (run apisix version): 2.15.0
Operating system (run uname -a):
OpenResty / Nginx version (run openresty -V or nginx -V):
etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):3.5.0
APISIX Dashboard version, if relevant:
Plugin runner version, for issues related to plugin runners:
LuaRocks version, for installation issues (run luarocks --version):

The text was updated successfully, but these errors were encountered:

jujiale · 2024-07-05T02:59:50Z

found in #8493 it also have the same error log, but it seems not methion the sync data issue, so I don't know if it is the same issue

jujiale · 2024-07-05T06:12:02Z

I try to modify the config_etcd.lua config_util.fire_all_clean_handlers(val) to config_util.fire_all_clean_handlers(false), which the error could the same as the above I mentioned， the data between etcd and apisix in not the same

yydance · 2024-10-15T12:48:32Z

今天似乎遇到了类似问题，dashboard新增了一条路由，etcd存储OK，但是apisix始终无法查到该路由，最终删除了原apisix pod后恢复正常，目前日志尚未看到相关信息

akshayparseja · 2024-11-15T05:21:23Z

Do we have anything on this, we are also actively facing sync issues where we resolve it by doing a rollout restart of the deployment of apisix pods but itll be helpful to know if its fixed in higher versions or is planned for a fix .

github-project-automation bot added this to Apache APISIX backlog Jul 5, 2024

github-project-automation bot moved this to 📋 Backlog in Apache APISIX backlog Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

help request: apisix don't sync data from etcd #11390

help request: apisix don't sync data from etcd #11390

jujiale commented Jul 5, 2024 •

edited

Loading

jujiale commented Jul 5, 2024

jujiale commented Jul 5, 2024

yydance commented Oct 15, 2024

akshayparseja commented Nov 15, 2024

help request: apisix don't sync data from etcd #11390

help request: apisix don't sync data from etcd #11390

Comments

jujiale commented Jul 5, 2024 • edited Loading

Description

Environment

jujiale commented Jul 5, 2024

jujiale commented Jul 5, 2024

yydance commented Oct 15, 2024

akshayparseja commented Nov 15, 2024

jujiale commented Jul 5, 2024 •

edited

Loading