Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: runtime error: /usr/local/apisix/apisix/discovery/nacos/init.lua:361: attempt to index upvalue 'applications' (a nil value) #9527

Closed
lakewatcher opened this issue May 23, 2023 · 28 comments · Fixed by #9960
Assignees
Labels
bug Something isn't working

Comments

@lakewatcher
Copy link

lakewatcher commented May 23, 2023

Current Behavior

using jmeter to test my route ,and has found some error ,the error message please see the pic .

apisix单机,etcd单机。jmeter压测时 50 thread 1 period 5 loop 执行同一个接口,报错。
image

image-20230520141905999

maybe change value for "local applications" is not work.
怀疑 "local applications" 刷新逻辑

Expected Behavior

No response

Error Logs

2023/05/23 16:04:22 [warn] 21022#21022: *614008 [lua] debug.lua:198: http_body_filter_phase(): call require("apisix").http_body_filter_phase() return:{} while sending to client, client: 10.202.16.131, server: _, subrequest: "/data-log/general/http2dubbo", upstream: "dubbo://10.202.68.118:20880"
2023/05/23 16:04:22 [warn] 21022#21022: *614008 [lua] debug.lua:190: http_log_phase(): call require("apisix").http_log_phase() args:{} while logging request, client: 10.202.16.131, server: _, subrequest: "/data-log/general/http2dubbo", upstream: "dubbo://10.202.68.118:20880"
2023/05/23 16:04:22 [warn] 21022#21022: *614008 [lua] plugin.lua:399: trace_plugins_info_for_debug(): Apisix-Plugins: request-id, cors while logging request, client: 10.202.16.131, server: _, subrequest: "/data-log/general/http2dubbo", upstream: "dubbo://10.202.68.118:20880"
2023/05/23 16:04:22 [warn] 21022#21022: *614008 [lua] debug.lua:198: http_log_phase(): call require("apisix").http_log_phase() return:{} while logging request, client: 10.202.16.131, server: _, subrequest: "/data-log/general/http2dubbo", upstream: "dubbo://10.202.68.118:20880"
2023/05/23 16:04:22 [error] 24067#24067: *613382 lua entry thread aborted: runtime error: /usr/local/apisix/apisix/discovery/nacos/init.lua:361: attempt to index upvalue 'applications' (a nil value)
stack traceback:
coroutine 0:
/usr/local/apisix/apisix/discovery/nacos/init.lua: in function 'nodes'
/usr/local/apisix/apisix/upstream.lua:259: in function 'set_upstream'
/usr/local/apisix/apisix/init.lua:538: in function 'fun_org'
/usr/local/apisix/apisix/debug.lua:195: in function 'http_access_phase'
access_by_lua(nginx.conf:321):2: in main chunk, client: 10.202.16.131, server: _, request: "POST /data-log/general/http2dubbo HTTP/1.0",

Steps to Reproduce

  1. apisix one node , etcd one node & no auth

  2. add route for using nacos discovery ,like this:
    {
    "type": "roundrobin",
    "scheme": "http",
    "discovery_type": "nacos",
    "discovery_args": {
    "group_name": "",
    "namespace_id": ""
    },
    "pass_host": "pass",
    "name": "data-log-dubbo",
    "service_name": "data-log",
    "keepalive_pool": {
    "idle_timeout": 60,
    "requests": 1000,
    "size": 320
    }
    }
    route like
    {
    "uri": "/data-log/general/http2dubbo",
    "name": "data-log_xxx",
    "methods": [ "POST" ],
    "plugins": {
    "dubbo-proxy": {
    "method": "invoke",
    "service_name": "com.xxx.proxy.XxxService",
    "service_version": "1.0.0"
    }
    },
    "upstream_id": "data-log-dubbo",
    "status": 1
    }

  3. using jmeter 50/200 thread 1 5 loop

Environment

  • APISIX version (run apisix version):2.15.1

  • Operating system (run uname -a): 3.10.0-1160.80.1.el7.x86_64 change: added doc of how to load plugin. #1 SMP Tue Nov 8 15:48:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

  • OpenResty / Nginx version (run openresty -V or nginx -V):
    nginx version: openresty/1.21.4.1
    built by gcc 9.3.1 20200408 (Red Hat 9.3.1-2) (GCC)
    built with OpenSSL 1.1.1s 1 Nov 2022
    TLS SNI support enabled
    configure arguments: --prefix=/usr/local/openresty/nginx --with-cc-opt='-O2 -DAPISIX_BASE_VER=1.21.4.1.5 -DNGX_GRPC_CLI_ENGINE_PATH=/usr/local/openresty/libgrpc_engine.so -DNGX_HTTP_GRPC_CLI_ENGINE_PATH=/usr/local/openresty/libgrpc_engine.so -DNGX_LUA_ABORT_AT_PANIC -I/usr/local/openresty/zlib/include -I/usr/local/openresty/pcre/include -I/usr/local/openresty/openssl111/include' --add-module=../ngx_devel_kit-0.3.1 --add-module=../echo-nginx-module-0.62 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.33 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.09 --add-module=../srcache-nginx-module-0.32 --add-module=../ngx_lua-0.10.21 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.33 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.19 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.9 --add-module=../ngx_stream_lua-0.0.11 --with-ld-opt='-Wl,-rpath,/usr/local/openresty/luajit/lib -Wl,-rpath,/usr/local/openresty/wasmtime-c-api/lib -L/usr/local/openresty/zlib/lib -L/usr/local/openresty/pcre/lib -L/usr/local/openresty/openssl111/lib -Wl,-rpath,/usr/local/openresty/zlib/lib:/usr/local/openresty/pcre/lib:/usr/local/openresty/openssl111/lib' --add-module=/tmp/tmp.gwSPZhqVyd/openresty-1.21.4.1/../mod_dubbo-1.0.2 --add-module=/tmp/tmp.gwSPZhqVyd/openresty-1.21.4.1/../ngx_multi_upstream_module-1.1.1 --add-module=/tmp/tmp.gwSPZhqVyd/openresty-1.21.4.1/../apisix-nginx-module-1.12.0 --add-module=/tmp/tmp.gwSPZhqVyd/openresty-1.21.4.1/../apisix-nginx-module-1.12.0/src/stream --add-module=/tmp/tmp.gwSPZhqVyd/openresty-1.21.4.1/../apisix-nginx-module-1.12.0/src/meta --add-module=/tmp/tmp.gwSPZhqVyd/openresty-1.21.4.1/../wasm-nginx-module-0.6.4 --add-module=/tmp/tmp.gwSPZhqVyd/openresty-1.21.4.1/../lua-var-nginx-module-v0.5.3 --add-module=/tmp/tmp.gwSPZhqVyd/openresty-1.21.4.1/../grpc-client-nginx-module-v0.4.0 --with-poll_module --with-pcre-jit --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-http_v2_module --without-mail_pop3_module --without-mail_imap_module --without-mail_smtp_module --with-http_stub_status_module --with-http_realip_module --with-http_addition_module --with-http_auth_request_module --with-http_secure_link_module --with-http_random_index_module --with-http_gzip_static_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-threads --with-compat --with-stream --with-http_ssl_module

  • etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):

{"hostname":"TLVM202017071","id":"0880b644-4f19-49e1-af77-1be7d3d7e196","version":"2.15.1","boot_time":1684823930,"etcd_version":"3.5.0"}

  • APISIX Dashboard version, if relevant:
  • Plugin runner version, for issues related to plugin runners:
  • LuaRocks version, for installation issues (run luarocks --version):
@lakewatcher
Copy link
Author

lakewatcher commented May 25, 2023

I wanted to investigate what exactly went wrong and looked at debug mode.

How to configure a dynamic debug for apisix.discovery.nacos.init.lua ‘s function _M.nodes(service_name, discovery_args), if i need print input & output info?

如果需要打印 apisix.discovery.nacos.init.lua 中 function _M.nodes(service_name, discovery_args)的出入参数,怎么去配置?

apache-apisix-runtime-dynamic-debugging

I refer to the document above, the configuration is as follows, but it doesn't work

我参考上面的文档。做出了如下配置,但是不起作用。

basic:
  enable: true
http_filter:
  enable: false         # enable or disable this feature
  enable_header_name: X-APISIX-Dynamic-Debug # the header name of dynamic enable
hook_conf:
  enable: true
  log_level: warn
  is_print_input_args: true
  is_print_return_value: true
  enable_modules:
    - hook_http_access_phase
  modules:
    hook_http_access_phase:
      lua_code_before: |
        ... ...
        core.log.error(core.json.encode(conf))
      apisix.discovery.nacos.init:
        - nodes

@lingsamuel lingsamuel added the bug Something isn't working label Jun 20, 2023
@lingsamuel lingsamuel moved this to 📋 Backlog in Apache APISIX backlog Jun 26, 2023
@Revolyssup
Copy link
Contributor

Please assign it to me

@Revolyssup
Copy link
Contributor

@lakewatcher You can set log level to debug here

error_log_level: warn # Logging level: info, debug, notice, warn, error, crit, alert, or emerg.

@Revolyssup
Copy link
Contributor

Revolyssup commented Aug 1, 2023

The nil panic comes from here

if not applications[namespace_id] or not applications[namespace_id][group_name] then

applications is still nil but the nil check is not performed here.
In most cases, the 5 second limit works but I guess in your testing conditions it didn't.

@Revolyssup
Copy link
Contributor

@lakewatcher With the same configuration, do you get this bug when you're simply sending a request and not load testing using jmeter?

@Revolyssup
Copy link
Contributor

@lakewatcher Will you make sure that the address mentioned in the config.yaml under discovery.nacos.host[] is reachable?

@Revolyssup
Copy link
Contributor

@lakewatcher Do you see any error that says nacos login fail in the error logs?

@AlinsRan AlinsRan added wait for update wait for the author's response in this issue/PR info-needed labels Aug 1, 2023
@lakewatcher
Copy link
Author

当时邮件已经回复,但是因为邮件回复没有展示,我再说明一下。

The email has been replied to, but since the email and account are not bound, I will explain again.

我就用中文回复吧

applications is still nil but the nil check is not performed here.
In most cases, the 5 second limit works but I guess in your testing conditions it didn't.
我不太清楚Lua如何处理并发访问和共享变量,所以无法提出建议,从结果上看没有超过5秒的请求。

can set log level to debug here
好的 我试了一下,没有多余的错误,后面我将贴出日志

With the same configuration, do you get this bug when you're simply sending a request and not load testing using jmeter?
可以手动复现,使用官方镜像2.15.3+写一个Dubbo Map接口 + 配置一个dubbpProxy 点击快一点就会发生。

discovery.nacos.host[] is reachable?
可以访问 从压测结果看也是大多数正确,只有部分错误。debug开启后,会看到登录过程可以获取token无报错。

Do you see any error that says nacos login fail in the error logs?
debug开启后,会看到登录过程可以获取token无报错。

@lakewatcher
Copy link
Author

DEBUG开启后的一个示例:

2023/08/02 10:31:05 [warn] 104#104: *118893 [lua] debug.lua:198: http_log_phase(): call require("apisix").http_log_phase() return:{} while logging request, client: 10.202.17.34, server: _, request: "POST /main-data/http2dubbo/car_createCar HTTP/1.0", host: ""
2023/08/02 10:31:05 [warn] 60#60: *120080 [lua] debug.lua:190: http_access_phase(): call require("apisix").http_access_phase() args:{}, client: 10.202.17.34, server: _, request: "POST /main-data/http2dubbo/car_createCar HTTP/1.0", host: ""
2023/08/02 10:31:05 [info] 60#60: *120080 [lua] init.lua:426: fun_org(): matched route: {"value":{"update_time":1690941321,"create_time":1690275764,"uri":"/main-data/http2dubbo/car_createCar","labels":{"SERVICE":"main-data","API_VERSION":"1.0.0","API_GROUP":"http2dubbo"},"id":"main-data_car_createCar","status":1,"methods":["POST"],"upstream_id":"main-data-dubbo","plugins":{"dubbo-proxy":{"service_name":"com.zto56.site.web.proxy.Http2DubboService","method":"invoke","service_version":"1.0.0"}},"priority":0,"name":"main-data_car_createCar"},"orig_modifiedIndex":5183,"modifiedIndex":5183,"createdIndex":4995,"key":"/apisix-inner/routes/main-data_car_createCar","clean_handlers":{},"update_count":0,"has_domain":false}, client: 10.202.17.34, server: _, request: "POST /main-data/http2dubbo/car_createCar HTTP/1.0", host: ""
2023/08/02 10:31:05 [info] 60#60: *120080 [lua] init.lua:471: fun_org(): stash ngx ctx: 1, client: 10.202.17.34, server: _, request: "POST /main-data/http2dubbo/car_createCar HTTP/1.0", host: ""
2023/08/02 10:31:05 [info] 60#60: *120080 [lua] upstream.lua:597: get_by_id(): parsed upstream: {"value":{"update_time":1690940485,"original_nodes":[{"host":"10.202.18.199","port":20880,"weight":1},{"host":"10.202.68.118","port":20880,"weight":1}],"pass_host":"pass","id":"main-data-dubbo","discovery_args":{"group_name":"","namespace_id":""},"type":"roundrobin","signature":"N:main-data-dubboS:main-dataNS:G:TC:3TR:20TS:3","create_time":1690275558,"parent":{"value":"table: 0x7f54891924b8","modifiedIndex":5179,"key":"/apisix-inner/upstreams/main-data-dubbo","clean_handlers":{},"createdIndex":4992,"has_domain":false},"nodes":[{"host":"10.202.18.199","port":20880,"priority":0,"weight":1},{"host":"10.202.68.118","port":20880,"priority":0,"weight":1}],"timeout":{"connect":3,"send":3,"read":20},"scheme":"http","hash_on":"vars","discovery_type":"nacos","service_name":"main-data","nodes_ref":"table: 0x7f5488a570e0","name":"main-data-dubbo"},"modifiedIndex":5179,"key":"/apisix-inner/upstreams/main-data-dubbo","clean_handlers":"table: 0x7f5486e58388","createdIndex":4992,"has_domain":false}, client: 10.202.17.34, server: _, request: "POST /main-data/http2dubbo/car_createCar HTTP/1.0", host: ""
2023/08/02 10:31:05 [info] 60#60: *120080 [lua] balancer.lua:195: pick_server(): route: {"value":{"update_time":1690941321,"create_time":1690275764,"uri":"/main-data/http2dubbo/car_createCar","labels":{"SERVICE":"main-data","API_VERSION":"1.0.0","API_GROUP":"http2dubbo"},"id":"main-data_car_createCar","status":1,"methods":["POST"],"upstream_id":"main-data-dubbo","plugins":{"dubbo-proxy":{"service_name":"com.zto56.site.web.proxy.Http2DubboService","method":"invoke","service_version":"1.0.0"}},"priority":0,"name":"main-data_car_createCar"},"orig_modifiedIndex":5183,"modifiedIndex":5183,"createdIndex":4995,"key":"/apisix-inner/routes/main-data_car_createCar","clean_handlers":{},"update_count":0,"has_domain":false}, client: 10.202.17.34, server: _, request: "POST /main-data/http2dubbo/car_createCar HTTP/1.0", host: ""
2023/08/02 10:31:05 [error] 104#104: *118906 lua entry thread aborted: runtime error: /usr/local/apisix/apisix/discovery/nacos/init.lua:361: attempt to index upvalue 'applications' (a nil value)
stack traceback:
coroutine 0:
/usr/local/apisix/apisix/discovery/nacos/init.lua: in function 'nodes'
/usr/local/apisix/apisix/upstream.lua:259: in function 'set_upstream'
/usr/local/apisix/apisix/init.lua:588: in function 'fun_org'
/usr/local/apisix/apisix/debug.lua:195: in function 'http_access_phase'
access_by_lua(nginx.conf:342):2: in main chunk, client: 10.202.17.34, server: _, request: "POST /main-data/http2dubbo/car_createCar HTTP/1.0", host: ""
2023/08/02 10:31:05 [warn] 104#104: *118906 [lua] debug.lua:190: http_header_filter_phase(): call require("apisix").http_header_filter_phase() args:{}, client: 10.202.17.34, server: _, request: "POST /main-data/http2dubbo/car_createCar HTTP/1.0", host: ""
2023/08/02 10:31:05 [info] 104#104: *118906 [lua] init.lua:620: fetch_ctx(): fetch ngx ctx: 2, client: 10.202.17.34, server: _, request: "POST /main-data/http2dubbo/car_createCar HTTP/1.0", host: ""

@lakewatcher
Copy link
Author

nacos login 和获取 token 日志

image

@Revolyssup
Copy link
Contributor

@lakewatcher So is the behavior that even after one of the requests(which uses nacos to get endpoints) succeed, some other request might fail again with the same nil error?

@Revolyssup
Copy link
Contributor

@lakewatcher Can you look for a notice log that says something like "update local variable application, event is: ", to see how the encoded applications look like?

@Revolyssup
Copy link
Contributor

@lakewatcher Do you get "wait init" as a warning log somewhere?

@Revolyssup
Copy link
Contributor

@lakewatcher If you get the error multiple times even after few of the requests succeed. Can you check if all the error come from the same nginx worker process. The process id can be found at the beginning of log like [error] <pid>#

@lakewatcher
Copy link
Author

@lakewatcher Can you look for a notice log that says something like "update local variable application, event is: ", to see how the encoded applications look like?

应该是每次启动的时候有 "update local variable application, event is: " 这样的日志。其余的时间没有

2023/08/02 16:44:31 [notice] 52#52: *415 [lua] init.lua:51: update local variable application, event is: updatingsource: discovery_nacos_update_applicationserver pid:46, application: {"":{"":{"demo02":[{"port":20880,"host":"10.202.68.118","weight":1}],"waybill-gather":[{"port":20889,"host":"10.202.18.65","weight":1}]}}}, context: ngx.timer

Do you get "wait init" as a warning log somewhere?

会有很多
2023/08/02 17:40:57 [warn] 132#132: *764954 [lua] init.lua:354: nodes(): wait init, client: 10.202.17.34, server: _, request: "POST /demo02/http2dubbo/car_createCar HTTP/1.0", host:

image

@lakewatcher
Copy link
Author

我场景是这样的

@lakewatcher
Copy link
Author

@lakewatcher If you get the error multiple times even after few of the requests succeed. Can you check if all the error come from the same nginx worker process. The process id can be found at the beginning of log like [error] <pid>#

@Revolyssup
image

@lakewatcher
Copy link
Author

之前回答有问题。会有超过5秒的请求。也会有一些nacos错误日志,但nacos错误日志不太多,2个小时发现了两次。
譬如:
2023/08/03 10:06:58 [error] 46#46: *87698 recv() failed (104: Connection reset by peer), context: ngx.timer
2023/08/03 10:06:58 [error] 46#46: *87698 [lua] init.lua:124: get_token_param(): nacos login fail:nacos password desc:connection reset by peer, context: ngx.timer
2023/08/03 10:06:58 [error] 46#46: *87698 [lua] init.lua:266: get_token_param error:connection reset by peer, context: ngx.timer

当报大量报错 lua entry thread aborted: runtime error: /usr/local/apisix/apisix/discovery/nacos/init.lua:361: attempt to index upvalue 'applications' (a nil value) 之后,我1个小时后,单次点击请求接口还会出现这个错误。

@Revolyssup
Copy link
Contributor

Revolyssup commented Aug 3, 2023

@lakewatcher Just one last thing to confirm. Do you ever see the logs: attempt to index upvalue 'applications' (a nil value) and get_token_param error:connection reset by peer, for the same nginx process ID?
One thing is clear that the error comes due to the fact that connection to Nacos is being disconnected due to some networking issue, which may not be related to APISIX.
What have you set in discovery.nacos.timeout in your configuration? You might increasing this timeout.read value if you're having network issues.

@AlinsRan AlinsRan moved this from 📋 Backlog to 🏗 In progress in Apache APISIX backlog Aug 3, 2023
@Revolyssup Revolyssup moved this from 🏗 In progress to 👀 In review in Apache APISIX backlog Aug 7, 2023
@Revolyssup Revolyssup removed wait for update wait for the author's response in this issue/PR info-needed labels Aug 7, 2023
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Apache APISIX backlog Aug 8, 2023
@lakewatcher
Copy link
Author

  1. attempt to index upvalue 'applications' (a nil value) and get_token_param error:connection reset by peer, they are not the same process.

  2. discovery.nacos.timeout and timeout.read are default values . eg:

    prefix: "/nacos/v1/"
    fetch_interval: 30 # default 30 sec
    weight: 100 # default 100
    timeout:
    connect: 2000 # default 2000 ms
    send: 2000 # default 2000 ms
    read: 5000 # default 5000 ms

@monkeyDluffy6017
Copy link
Contributor

@lakewatcher We can continue the discussion here, please check if the pr #9960 fix the problem

@lakewatcher
Copy link
Author

@monkeyDluffy6017 I haven't built a development environment before。Can you pass the pressure test? Or provide me with a Docker image

@lakewatcher
Copy link
Author

I don't know if directly modifying the file(init.lua) is the correct testing method.

In this case, the 200 thread 3loop directly reported the following error,error:42%, log eg:

2023/08/08 17:58:29 [error] 23026#23026: *119397 [lua] init.lua:590: http_access_phase(): failed to set upstream: no valid upstream node: nil,

@monkeyDluffy6017
Copy link
Contributor

@Revolyssup Please help to check

@Revolyssup
Copy link
Contributor

@lakewatcher The panic was fixed by the PR and the error you are getting now is a valid error suggesting that it could not find upstream. Possible reasons include an error with the connection with nacos as pointed out earlier in the above message. Can you make sure that there is no network connectivity issue between APISIX and nacos?

@lakewatcher
Copy link
Author

No network issues was found .
So far, the pressure testing performance has not been good. I really want to know ,is there any report about apisix's discover that using nacos , in terms of how much concurrency can it support ? no matter the upstream discovery is dubbo or Spring Cloud

@Revolyssup
Copy link
Contributor

No network issues was found . So far, the pressure testing performance has not been good. I really want to know ,is there any report about apisix's discover that using nacos , in terms of how much concurrency can it support ? no matter the upstream discovery is dubbo or Spring Cloud

I am not aware of any such concurrency issue. Currently I do not have the idea as to why applications value is getting set nil with high number of concurrent requests. cc: @kingluo

@shreemaan-abhishek
Copy link
Contributor

fixed by #9960

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants