Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nginx crashed - Worker process exited on signal 11 (core dumped) #8232

Closed
tobernguyen opened this issue Feb 9, 2022 · 8 comments
Closed
Labels
needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@tobernguyen
Copy link

tobernguyen commented Feb 9, 2022

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):


NGINX Ingress controller
Release: v1.0.3
Build: 6e12582
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.19.9


Controller image: k8s.gcr.io/ingress-nginx/controller:v1.1.1@sha256:4ade87838eb8256b094fbb5272d7dda9b6c7fa8b759e6af5383c1300996a7452

Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.3", GitCommit:"c92036820499fedefec0f847e2054d824aea6cd1", GitTreeState:"clean", BuildDate:"2021-10-27T18:34:20Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.15-eks-9c63c4", GitCommit:"9c63c4037a56f9cad887ee76d55142abd4155179", GitTreeState:"clean", BuildDate:"2021-10-20T00:21:03Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: AWS ECK

  • OS (e.g. from /etc/os-release): Amazon Linux

  • Kernel (e.g. uname -a): Linux ip-192-168-17-52.ec2.internal 5.4.156-83.273.amzn2.x86_64 Basic structure  #1 SMP Sat Oct 30 12:59:07 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

  • How was the ingress-nginx-controller installed:
    Using helm template -> wrapped into kustomize -> Installed by ArgoCD

What happened:

  • Every several hours, the nginx process will crash, and the controller printed this logline
2022/02/09 20:41:51 [alert] 32#32: worker process 240 exited on signal 11 (core dumped)
  • We have an alert to track for this logline because we saw this behavior a very long time ago, so we created an alert for it. Once we upgraded the controller to v0.50.0, the problem went away.
  • Since we just upgraded nginx-ingress to the latest version (v1.1.1), this issue started to appear again.

What you expected to happen:

  • Not crashing so that my connection is not cut off and won't hurt our uptime

How to reproduce it:

Anything else we need to know:

  • I tried to get most of the information from the core dump file, but I couldn't get everything.
[Current thread is 1 (LWP 245)]
(gdb) bt
Core was generated by `nginx: worker process                               '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f262e85dc0c in ?? () from /lib/libssl.so.1.1
[Current thread is 1 (LWP 245)]
(gdb) bt
#0  0x00007f262e85dc0c in ?? () from /lib/libssl.so.1.1
#1  0x00007f262e6f0d0d in OPENSSL_LH_doall_arg () from /lib/libcrypto.so.1.1
#2  0x00007f262e85e6d0 in SSL_CTX_flush_sessions () from /lib/libssl.so.1.1
#3  0x00007f262e874ad3 in ?? () from /lib/libssl.so.1.1
#4  0x00007f262e868fb4 in ?? () from /lib/libssl.so.1.1
#5  0x000056320a908678 in ngx_ssl_handshake (c=c@entry=0x7f262dffe9c0) at src/event/ngx_event_openssl.c:1720
#6  0x000056320a908a91 in ngx_ssl_handshake_handler (ev=0x7f262de64610) at src/event/ngx_event_openssl.c:2091
#7  0x000056320a903103 in ngx_epoll_process_events (cycle=0x7f2628041ef0, timer=<optimized out>, flags=<optimized out>) at src/event/modules/ngx_epoll_module.c:901
#8  0x000056320a8f6140 in ngx_process_events_and_timers (cycle=cycle@entry=0x7f2628041ef0) at src/event/ngx_event.c:257
#9  0x000056320a9007c8 in ngx_worker_process_cycle (cycle=0x7f2628041ef0, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:753
#10 0x000056320a8fe850 in ngx_spawn_process (cycle=cycle@entry=0x7f2628041ef0, proc=proc@entry=0x56320a9006bf <ngx_worker_process_cycle>, data=data@entry=0x5, name=name@entry=0x56320aa25f97 "worker process", respawn=respawn@entry=-4)
    at src/os/unix/ngx_process.c:199
#11 0x000056320a8ff49e in ngx_start_worker_processes (cycle=cycle@entry=0x7f2628041ef0, n=6, type=type@entry=-4) at src/os/unix/ngx_process_cycle.c:373
#12 0x000056320a901369 in ngx_master_process_cycle (cycle=0x7f2628041ef0, cycle@entry=0x7f262e508210) at src/os/unix/ngx_process_cycle.c:234
#13 0x000056320a8d3af9 in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:386
  • Get sharedlibrary info
(gdb) info sharedlibrary
From                To                  Syms Read   Shared Object Library
0x00007f262e909c90  0x00007f262e96ac51  Yes         /usr/local/lib/libluajit-5.1.so.2
0x00007f262e8a71f0  0x00007f262e8e2841  Yes (*)     /usr/lib/libpcre.so.1
0x00007f262e841ae0  0x00007f262e87f561  Yes (*)     /lib/libssl.so.1.1
0x00007f262e619000  0x00007f262e76e971  Yes (*)     /lib/libcrypto.so.1.1
0x00007f262e58c300  0x00007f262e599ae1  Yes (*)     /lib/libz.so.1
0x00007f262e55a600  0x00007f262e56f4c1  Yes (*)     /usr/lib/libGeoIP.so.1
0x00007f262e99b070  0x00007f262e9e2761  Yes         /lib/ld-musl-x86_64.so.1
0x00007f262e53c2f0  0x00007f262e54c551  Yes (*)     /usr/lib/libgcc_s.so.1
0x00007f262e37c220  0x00007f262e37d931  Yes         /etc/nginx/modules/ngx_http_geoip2_module.so
0x00007f262e3751c0  0x00007f262e3774f1  Yes (*)     /usr/lib/libmaxminddb.so.0
0x00007f2626f12430  0x00007f2626f16031  Yes         /usr/local/lib/lua/5.1/cjson.so
0x00007f2626f0c070  0x00007f2626f0cb51  Yes         /usr/local/lib/lua/librestychash.so
(*): Shared library is missing debugging information.
  • I docker exec to the container and run the following to set up the GDB environment
docker exec -u 0 -it CONTAINER_ID /bin/sh
apk update
apk add gdb
apk add musl-dbg
exit

# Then
docker exec -it CONTAINER_ID /bin/sh
gdb /sbin/nginx core.XXX
bt
@tobernguyen tobernguyen added the kind/bug Categorizes issue or PR as related to a bug. label Feb 9, 2022
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 9, 2022
@k8s-ci-robot
Copy link
Contributor

@tobernguyen: This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@longwuyuan
Copy link
Contributor

Can you please upgrade to latest release and update status.
There used to be some related problems but some fixes were made hence testing with latest releae my provide more information.

/remove-kind bug
/triage needs-information

@k8s-ci-robot k8s-ci-robot added triage/needs-information Indicates an issue needs more information in order to work on it. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. and removed kind/bug Categorizes issue or PR as related to a bug. labels Feb 10, 2022
@tobernguyen
Copy link
Author

Can you please upgrade to latest release and update status. There used to be some related problems but some fixes were made hence testing with latest releae my provide more information.

/remove-kind bug /triage needs-information

Please correct me if I'm wrong but I'm already on the latest version (v1.1.1), right? The controller is using this image k8s.gcr.io/ingress-nginx/controller:v1.1.1@sha256:4ade87838eb8256b094fbb5272d7dda9b6c7fa8b759e6af5383c1300996a7452.

But I was surprised running nginx-ingress-controller --version on the pod returning:

NGINX Ingress controller
Release: v1.0.3
Build: https://github.com/kubernetes/ingress-nginx/commit/6e125826ad3968709392f2056023d4d7474ed4f5
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.19.9

Should the release here be v1.1.1 instead of v1.0.3?

@tobernguyen
Copy link
Author

Can you please upgrade to latest release and update status. There used to be some related problems but some fixes were made hence testing with latest releae my provide more information.
/remove-kind bug /triage needs-information

Please correct me if I'm wrong but I'm already on the latest version (v1.1.1), right? The controller is using this image k8s.gcr.io/ingress-nginx/controller:v1.1.1@sha256:4ade87838eb8256b094fbb5272d7dda9b6c7fa8b759e6af5383c1300996a7452.

But I was surprised running nginx-ingress-controller --version on the pod returning:

NGINX Ingress controller
Release: v1.0.3
Build: https://github.com/kubernetes/ingress-nginx/commit/6e125826ad3968709392f2056023d4d7474ed4f5
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.19.9

Should the release here be v1.1.1 instead of v1.0.3?

@longwuyuan any updates?

@longwuyuan
Copy link
Contributor

I don't know its showing 1.0.3.
It should show 1.1.1

@tobernguyen
Copy link
Author

I don't know its showing 1.0.3. It should show 1.1.1

Ok I re-checked and looks like my image's digest wasn't correct and it was pointing to v1.0.3. I fixed it and the controller is now running v1.1.1

-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.1.1
  Build:         a17181e43ec85534a6fea968d95d019c5a4bc8cf
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.19.9

-------------------------------------------------------------------------------

I will monitor it for several days and will report back.

@strongjz
Copy link
Member

strongjz commented Apr 6, 2022

@tobernguyen is every working fine? If not please reopen and comment here.

/close

@k8s-ci-robot
Copy link
Contributor

@strongjz: Closing this issue.

In response to this:

@tobernguyen is every working fine? If not please reopen and comment here.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

4 participants