-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NGINX ingress creating endless core dumps #7080
Comments
Seems that we don't have a mechanism to change the |
That sounds good on its own, however, it won't make NGINX to create less core dumps, meaning that if I restrict it to 2MB, it can still create thousands of these dumps and still explode my filesystem (unless I set it to 0, meaning that no core dumps will be created, in which case I'm ignoring the problem rather noticing it exists) |
That's like an internal bug of OpenSSL, it's difficult to troubleshoot it as the debug symbols were stripped. We may wait for a while and see whether somebody has some similar experiences, which might be useful. |
would you be interested in showing what your cluster looks like with ;
|
@longwuyuan I don't want to expose that kind of information on my environment. |
Maybe write very clear details about hardware, software, config and the list of commands etc that someone can execute, for example on minikube, to be able to reproduce this problem |
I have no idea how to reproduce this. About configuration, I have thousands of ingresses that populate Any idea how can I export a full dump interpretation on this to maybe help understand the problem? |
not every AWS EKS user is reporting the same behaviour. There was one other issue reported stating core dumps. The best thought on that was to spread load. Any chance the problem is being caused by your use case only ? /remove-kind bug |
I double-checked and the load isn't different or suddenly too immense. I guess it is probably an error with something in my environment and not necessarily a bug in NGINX, but my nginx.conf consist of thousands of lines, @longwuyuan do you have any idea on where should I look for in the configuration itself? |
You could be hitting a limit or memory violation, hard to tell which until the core backtrace is explicit. You can upgrade to most recent release of ingress-controller, check and verify, how to run gdb for nginx coredumps and post another backtrace that shows the size or any other details of that datastructure that its complaining about ;
Also you can try to replicate the size of objects in another cluster but try spreading the load. |
@tokers has the option to set I realise ignoring the coredumps is hiding the issue, but in our scenario this would be much preferred to taking out the entire ingress with some misconfigured certs. |
@mitom how did you come up with finding that the chain is the root cause? |
No, the core dumps only contained:
which doesn't mean anything to me really. It is more or less an educated guess based on that around the time we have this issue the logs were spammed with invalid certificate errors in the controller. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
We have the same issue in coredump: #6896 (comment)
We also use cert-manager, but have no any errors or unvalidated certs. There are no errors both in cert-manager logs and ingress-nginx, but worker still dies with
We cannot submit this issue to nginx upstream, because ingress-nginx compiles nginx from sources with additional plugins and patches. Also i have pretty limited knowledge of gdb and debug symbols, so was unable to find them for libssl both on alpine and debian to fix this part in coredump:
Any help would be greatly appreciated. |
Hey @sepich thanks. I will start digging now into openssl problems, as we could remove the openresty bug. Are you using NGINX v1.0.2? Can you provide me some further information about the size of your environment, amount of Ingresses, amount of different SSL certificates? Thanks |
No we are still on k8s 1.19 and so ingress-nginx v0.49.2
From #6896 (comment)
99% of ingresses are SSL, so I would say it is 215 certs also. This number is pretty stable, not like ingresses are created and deleted each 5 min. More like once per week. |
Ok, thanks! Will check ASAP :) |
I'm wondering if this patch (https://github.com/openresty/openresty/blob/master/patches/openssl-1.1.1f-sess_set_get_cb_yield.patch) which is applied by Openresty shouldn't be applied in OpenSSL as well. |
@sepich in case I generate an image of 0.49.3 (to be released) with Openresty OpenSSL patch applied, are you able to test and provide some feedback on that? |
Hi @sepich , I have sent you an email to arrange a call with an interactive gdb session as said here. Thanks very much! |
@rikatz, great finding! https://github.com/openresty/lua-resty-core/blob/master/lib/ngx/ssl/session.md#description
I've checked that no
But there is one more patch for @doujiang24, got it! |
Yeap, I can.
Actually I already have a base image with the right patches, and proper
linking:
ldd /sbin/nginx |grep ssl
libssl.so.1.1 => /usr/local/openresty/openssl111/lib/libssl.so.1.1
(0x7f7be72b7000)
libcrypto.so.1.1 => /usr/local/openresty/openssl111/lib/libcrypto.so.1.1
(0x7f7be6fc1000)
I have published this base image in rpkatz/nginx:patchedopenresty so you
can build your own controller using it, for example, in case of
"legacy/0.49x" branch:
* git clone -b legacy ***@***.***:kubernetes/ingress-nginx && cd
ingress-nginx
* make build
* BASE_IMAGE=rpkatz/nginx:patchedopenresty make image
…On Fri, Oct 1, 2021 at 9:56 AM Alex R ***@***.***> wrote:
@rikatz <https://github.com/rikatz>, great finding!
This patch originally was created as two parts for both nginx and openssl:
***@***.***
<openresty/openresty@97901f3>
https://github.com/openresty/lua-resty-core/blob/master/lib/ngx/ssl/session.md#description
This Lua API can be used to implement distributed SSL session caching for
downstream SSL connections, thus saving a lot of full SSL handshakes which
are very expensive.
I've checked that no ngx.ssl.session, ssl_session_fetch_by_lua* and
ssl_session_store_by_lua* is being used in ingress-nginx. We also do not
use any Lua code in ingress snippets. So, I've deleted
images/nginx/rootfs/patches/nginx-1.19.9-ssl_sess_cb_yield.patch file (to
avoid rebuilding openssl), then rebuild nginx and v0.49.2. But the issue
and coredump backtrace is the same:
#5 0x00007fdb5619efb4 in ?? () from /lib/libssl.so.1.1
#6 0x0000562d5b8b8c68 in ngx_ssl_handshake ***@***.***=0x7fdb55a2fa20) at src/event/ngx_event_openssl.c:1720
#7 0x0000562d5b8b9081 in ngx_ssl_handshake_handler (ev=0x7fdb5588a0c0) at src/event/ngx_event_openssl.c:2069
But there is one more patch for ngx_event_openssl.c -
nginx-1.19.9-ssl_cert_cb_yield.patch:
https://github.com/openresty/lua-nginx-module#ssl_certificate_by_lua_block
Checked that ingress-nginx lua code does not use this, and rebuild image
without this patch too.
But issue still remains.
Looks like I misunderstood something, maybe you can build some test image
with minimum amount of patched only to make ingress-nginx-controller
working?
@doujiang24 <https://github.com/doujiang24>, got it!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7080 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABWZQBK62OOFF2X5O3JM7O3UEWVYTANCNFSM43U7BRIA>
.
|
/remove-lifecycle stale |
Thank you, unfortunately it still fails (v0.49.2 on top of it):
Is it now possible to load openssl debug symbols somehow? |
@sepich The debug symbol package for openresty-openssl111 is |
Thanks, it is:
seems to be working:
|
@sepich Great, the |
@rikatz, Could you please share how to build image like |
#7732 This way :) |
While recompiling openssl I've found workaround for this issue - edit nginx.conf
Unfortunately it is not exposed via some annotation, so have to edit template. There is even SO article for this. |
Hello, @sepich Unfortunately, however, after talking to OpenSSL and Nginx team, I still can not find where the bug is. Hello @rikatz |
yeah sure, I will open a new PR and add that as a configuration :) |
NGINX Ingress controller version:
v0.41.2
Kubernetes version (use
kubectl version
):Environment:
What happened:
My NGINX ingress controllers started to created endless core dump files.
This started to fill up some of my nodes' filesystem, creating disk-pressure on them and started to evict other pods.
I do not have any debug log set up or intentionally configured to create core dumps with NGINX.
What you expected to happen:
Not sure if preventing core dumps is the right way, gdb output in the bottom.
How to reproduce it: Not sure I understand why it happens now. We do have autoscaling enabled and I don't think we reach the resource limits, so not sure why it happens.
Anything else we need to know:
I managed to copy the core dump, and tried to investigate it, but couldn't find anything verbose about it:
In the meantime, I added a
LimitRange
for default limit ofephemeral-storage
of10Gi
to prevent it to reach max node storage (my pods reached ~60Gi
storage usage only from core dumps)/kind bug
The text was updated successfully, but these errors were encountered: