404 error suddenly appearing on one of our environments and ingress level issue #12748
Labels
needs-kind
Indicates a PR lacks a `kind/foo` label and requires one.
needs-priority
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
Hi All,
We have been facing a continuous 404 error suddenly appearing on one of our environments. This issue occurs without any recent code changes in the Ingress Controller or infrastructure. When accessing any URL in this environment, we are receiving the 404 (Not Found) error. Please see the error message below for reference
Error: exit status 1
2025/01/21 20:14:16 [emerg] 197#197: "proxy_http_version" directive is duplicate in /tmp/nginx/nginx-cfg2330729348:185270
nginx: [emerg] "proxy_http_version" directive is duplicate in /tmp/nginx/nginx-cfg2330729348:185270
nginx: configuration file /tmp/nginx/nginx-cfg2330729348 test failed
As per the error above, I’m trying to identify where the proxy_http_version directive is defined twice in our configuration files. However, we haven’t explicitly defined this directive anywhere, and we believe it is being automatically generated by default as part of the NGINX Ingress Controller settings.
As per a solution mentioned on a GitHub page, we added the following annotation:
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
However, this annotation is already in place on our configuration. For reference, we are using EKS version 1.28 and Ingress Controller version 1.9.4.
The strange part is that all lower environments are functioning fine with the same configuration, including preprod and prod environments, except this particular environment.
To troubleshoot, I upgraded the Ingress Controller to version 1.10.4, but that did not resolve the issue. After further investigation, we suspected a CoreDNS issue, but all components in the EKS cluster are up and running without any issues, and there were no apparent problems at the CoreDNS level.
Interestingly, the issue started working after two days, despite no changes being made to the code or configuration. We are concerned about the potential impact if this happens in production.
Could you please help us with the following:
Root Cause Analysis (RCA): Could you suggest any potential causes for this behavior?
Preventative Measures: How can we avoid this situation in the future, especially in production environments?
Looking forward to your assistance.
Thanks,
The text was updated successfully, but these errors were encountered: