Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods are running but registry is unresponsive at some point after installation #144

Open
SalaryTheft opened this issue Mar 26, 2024 · 3 comments

Comments

@SalaryTheft
Copy link

SalaryTheft commented Mar 26, 2024

All the pods are running but registry server is unresponsive at some point after installation.
(no response at curl https://localhost:8443)

I have to restart the pods or even have to reboot the host to get it working.

All the pods are running:

[root@bastion ~]# podman ps -a
CONTAINER ID  IMAGE                                                    COMMAND         CREATED       STATUS       PORTS                   NAMES
db266da38b9c  registry.access.redhat.com/ubi8/pause:8.7-6              infinity        13 hours ago  Up 13 hours  0.0.0.0:8443->8443/tcp  5e70ee01733b-infra
767d8f665354  registry.redhat.io/rhel8/redis-6:1-92.1669834635         run-redis       13 hours ago  Up 13 hours  0.0.0.0:8443->8443/tcp  quay-redis
73b03983db2f  registry.redhat.io/rhel8/postgresql-10:1-203.1669834630  run-postgresql  13 hours ago  Up 13 hours  0.0.0.0:8443->8443/tcp  quay-postgres
41c21e84bb3e  registry.redhat.io/quay/quay-rhel8:v3.8.14               registry        13 hours ago  Up 13 hours  0.0.0.0:8443->8443/tcp  quay-app

New logs are comming up, so the containers are running fine... I guess?

[root@bastion ~]# podman logs --tail=10 -f quay-app
exportactionlogsworker stdout | 2024-03-26 00:28:00,067 [52] [INFO] [apscheduler.executors.default] Running job "QueueWorker.poll_queue (trigger: interval[0:01:00], next run at: 2024-03-26 00:29:00 UTC)" (scheduled at 2024-03-26 00:28:00.067443+00:00)
exportactionlogsworker stdout | 2024-03-26 00:28:00,071 [52] [INFO] [apscheduler.executors.default] Job "QueueWorker.poll_queue (trigger: interval[0:01:00], next run at: 2024-03-26 00:29:00 UTC)" executed successfully
notificationworker stdout | 2024-03-26 00:28:04,724 [63] [INFO] [apscheduler.executors.default] Running job "QueueWorker.poll_queue (trigger: interval[0:00:10], next run at: 2024-03-26 00:28:14 UTC)" (scheduled at 2024-03-26 00:28:04.724010+00:00)
notificationworker stdout | 2024-03-26 00:28:04,727 [63] [INFO] [apscheduler.executors.default] Job "QueueWorker.poll_queue (trigger: interval[0:00:10], next run at: 2024-03-26 00:28:14 UTC)" executed successfully
repositorygcworker stdout | 2024-03-26 00:28:11,768 [75] [INFO] [apscheduler.executors.default] Running job "QueueWorker.run_watchdog (trigger: interval[0:01:00], next run at: 2024-03-26 00:29:11 UTC)" (scheduled at 2024-03-26 00:28:11.767795+00:00)
repositorygcworker stdout | 2024-03-26 00:28:11,769 [75] [INFO] [apscheduler.executors.default] Job "QueueWorker.run_watchdog (trigger: interval[0:01:00], next run at: 2024-03-26 00:29:11 UTC)" executed successfully
gcworker stdout | 2024-03-26 00:28:12,861 [53] [INFO] [apscheduler.executors.default] Running job "GarbageCollectionWorker._garbage_collection_repos (trigger: interval[0:00:30], next run at: 2024-03-26 00:28:42 UTC)" (scheduled at 2024-03-26 00:28:12.860612+00:00)
gcworker stdout | 2024-03-26 00:28:12,868 [53] [INFO] [apscheduler.executors.default] Job "GarbageCollectionWorker._garbage_collection_repos (trigger: interval[0:00:30], next run at: 2024-03-26 00:28:42 UTC)" executed successfully
notificationworker stdout | 2024-03-26 00:28:14,724 [63] [INFO] [apscheduler.executors.default] Running job "QueueWorker.poll_queue (trigger: interval[0:00:10], next run at: 2024-03-26 00:28:24 UTC)" (scheduled at 2024-03-26 00:28:14.724010+00:00)
notificationworker stdout | 2024-03-26 00:28:14,731 [63] [INFO] [apscheduler.executors.default] Job "QueueWorker.poll_queue (trigger: interval[0:00:10], next run at: 2024-03-26 00:28:24 UTC)" executed successfully

Nothing strange on the quay-app container deatails.

[root@bastion ~]# podman inspect quay-app
[
     {
          "Id": "41c21e84bb3e90a2ae46b480d9ca00e1a924a27e2c20157f09d21d29c9b4a389",
          "Created": "2024-03-25T07:50:17.451450987-04:00",
          "Path": "dumb-init",
          "Args": [
               "--",
               "/quay-registry/quay-entrypoint.sh",
               "registry"
          ],
          "State": {
               "OciVersion": "1.1.0-rc.3",
               "Status": "running",
               "Running": true,
               "Paused": false,
               "Restarting": false,
               "OOMKilled": false,
               "Dead": false,
               "Pid": 7577,
               "ConmonPid": 7575,
               "ExitCode": 0,
               "Error": "",
               "StartedAt": "2024-03-25T07:50:17.61683645-04:00",
               "FinishedAt": "0001-01-01T00:00:00Z",
               "Health": {
                    "Status": "",
                    "FailingStreak": 0,
                    "Log": null
               },
               "CgroupPath": "/machine.slice/machine-libpod_pod_5e70ee01733b02f854d79d85dd78dc5c8ecdb2c50de7472a314441897f9296dc.slice/libpod-41c21e84bb3e90a2ae46b480d9ca00e1a924a27e2c20157f09d21d29c9b4a389.scope",
               "CheckpointedAt": "0001-01-01T00:00:00Z",
               "RestoredAt": "0001-01-01T00:00:00Z"
          },
          "Image": "93b30dda302e3554fcfea484da1fc7b981dc4ac173b195def4ab79b86dfaf616",
          "ImageDigest": "sha256:19e0709632a860dc93e54e9d79b8da9b02334122775932eaefaccf4783524ef4",
          "ImageName": "registry.redhat.io/quay/quay-rhel8:v3.8.14",
          "Rootfs": "",
          "Pod": "5e70ee01733b02f854d79d85dd78dc5c8ecdb2c50de7472a314441897f9296dc",
          "ResolvConfPath": "/run/containers/storage/overlay-containers/db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec/userdata/resolv.conf",
          "HostnamePath": "/run/containers/storage/overlay-containers/41c21e84bb3e90a2ae46b480d9ca00e1a924a27e2c20157f09d21d29c9b4a389/userdata/hostname",
          "HostsPath": "/run/containers/storage/overlay-containers/db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec/userdata/hosts",
          "StaticDir": "/var/lib/containers/storage/overlay-containers/41c21e84bb3e90a2ae46b480d9ca00e1a924a27e2c20157f09d21d29c9b4a389/userdata",
          "OCIConfigPath": "/var/lib/containers/storage/overlay-containers/41c21e84bb3e90a2ae46b480d9ca00e1a924a27e2c20157f09d21d29c9b4a389/userdata/config.json",
          "OCIRuntime": "crun",
          "ConmonPidFile": "/run/quay-app.service-pid",
          "PidFile": "/run/containers/storage/overlay-containers/41c21e84bb3e90a2ae46b480d9ca00e1a924a27e2c20157f09d21d29c9b4a389/userdata/pidfile",
          "Name": "quay-app",
          "RestartCount": 0,
          "Driver": "overlay",
          "MountLabel": "system_u:object_r:container_file_t:s0:c273,c984",
          "ProcessLabel": "system_u:system_r:container_t:s0:c273,c984",
          "AppArmorProfile": "",
          "EffectiveCaps": null,
          "BoundingCaps": [
               "CAP_CHOWN",
               "CAP_DAC_OVERRIDE",
               "CAP_FOWNER",
               "CAP_FSETID",
               "CAP_KILL",
               "CAP_NET_BIND_SERVICE",
               "CAP_SETFCAP",
               "CAP_SETGID",
               "CAP_SETPCAP",
               "CAP_SETUID",
               "CAP_SYS_CHROOT"
          ],
          "ExecIDs": [],
          "GraphDriver": {
               "Name": "overlay",
               "Data": {
                    "LowerDir": "/var/lib/containers/storage/overlay/19dbf084110759a3d249cd4ec487e83f55eca64deafc5d51d04787a3716fadb8/diff",
                    "MergedDir": "/var/lib/containers/storage/overlay/fc1f2d2a88e454e8c41e3aa22e5d91e18001506f13821dd60eee47a918b1bc50/merged",
                    "UpperDir": "/var/lib/containers/storage/overlay/fc1f2d2a88e454e8c41e3aa22e5d91e18001506f13821dd60eee47a918b1bc50/diff",
                    "WorkDir": "/var/lib/containers/storage/overlay/fc1f2d2a88e454e8c41e3aa22e5d91e18001506f13821dd60eee47a918b1bc50/work"
               }
          },
          "Mounts": [
               {
                    "Type": "volume",
                    "Name": "f19507ef7f837c63cb92f116e042f12daa4c00a0c37c444cb1c7988687e66a0d",
                    "Source": "/var/lib/containers/storage/volumes/f19507ef7f837c63cb92f116e042f12daa4c00a0c37c444cb1c7988687e66a0d/_data",
                    "Destination": "/tmp",
                    "Driver": "local",
                    "Mode": "",
                    "Options": [
                         "nodev",
                         "exec",
                         "nosuid",
                         "rbind"
                    ],
                    "RW": true,
                    "Propagation": "rprivate"
               },
               {
                    "Type": "volume",
                    "Name": "63e0413f366aa2f74f9370d04014e48038006bb4cf1b2ff5435fc9cb724de3ce",
                    "Source": "/var/lib/containers/storage/volumes/63e0413f366aa2f74f9370d04014e48038006bb4cf1b2ff5435fc9cb724de3ce/_data",
                    "Destination": "/var/log",
                    "Driver": "local",
                    "Mode": "",
                    "Options": [
                         "nodev",
                         "exec",
                         "nosuid",
                         "rbind"
                    ],
                    "RW": true,
                    "Propagation": "rprivate"
               },
               {
                    "Type": "volume",
                    "Name": "097a7e8bf2e6d0a80a575d14bd6bdfa58d16919ff83a9b403d6dc06915ae20bc",
                    "Source": "/var/lib/containers/storage/volumes/097a7e8bf2e6d0a80a575d14bd6bdfa58d16919ff83a9b403d6dc06915ae20bc/_data",
                    "Destination": "/conf/stack",
                    "Driver": "local",
                    "Mode": "",
                    "Options": [
                         "nodev",
                         "exec",
                         "nosuid",
                         "rbind"
                    ],
                    "RW": true,
                    "Propagation": "rprivate"
               },
               {
                    "Type": "bind",
                    "Source": "/opt/quay/config/quay-config",
                    "Destination": "/quay-registry/conf/stack",
                    "Driver": "",
                    "Mode": "",
                    "Options": [
                         "rbind"
                    ],
                    "RW": true,
                    "Propagation": "rprivate"
               },
               {
                    "Type": "bind",
                    "Source": "/opt/quay/data",
                    "Destination": "/datastorage",
                    "Driver": "",
                    "Mode": "",
                    "Options": [
                         "rbind"
                    ],
                    "RW": true,
                    "Propagation": "rprivate"
               }
          ],
          "Dependencies": [
               "db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec"
          ],
          "NetworkSettings": {
               "EndpointID": "",
               "Gateway": "10.88.0.1",
               "IPAddress": "10.88.0.2",
               "IPPrefixLen": 16,
               "IPv6Gateway": "",
               "GlobalIPv6Address": "",
               "GlobalIPv6PrefixLen": 0,
               "MacAddress": "a6:9c:af:e1:1b:a7",
               "Bridge": "",
               "SandboxID": "",
               "HairpinMode": false,
               "LinkLocalIPv6Address": "",
               "LinkLocalIPv6PrefixLen": 0,
               "Ports": {
                    "8443/tcp": [
                         {
                              "HostIp": "",
                              "HostPort": "8443"
                         }
                    ]
               },
               "SandboxKey": "/run/netns/netns-67bc251f-bac0-1817-c280-f49b54fda5bc",
               "Networks": {
                    "podman": {
                         "EndpointID": "",
                         "Gateway": "10.88.0.1",
                         "IPAddress": "10.88.0.2",
                         "IPPrefixLen": 16,
                         "IPv6Gateway": "",
                         "GlobalIPv6Address": "",
                         "GlobalIPv6PrefixLen": 0,
                         "MacAddress": "a6:9c:af:e1:1b:a7",
                         "NetworkID": "podman",
                         "DriverOpts": null,
                         "IPAMConfig": null,
                         "Links": null,
                         "Aliases": [
                              "db266da38b9c",
                              "quay-pod"
                         ]
                    }
               }
          },
          "Namespace": "",
          "IsInfra": false,
          "IsService": false,
          "KubeExitCodePropagation": "invalid",
          "lockNumber": 37,
          "Config": {
               "Hostname": "quay-pod",
               "Domainname": "",
               "User": "1001",
               "AttachStdin": false,
               "AttachStdout": false,
               "AttachStderr": false,
               "Tty": false,
               "OpenStdin": false,
               "StdinOnce": false,
               "Env": [
                    "LANG=C.UTF-8",
                    "QUAYDIR=/quay-registry",
                    "PYTHONUNBUFFERED=1",
                    "RED_HAT_QUAY=true",
                    "TERM=xterm",
                    "container=oci",
                    "PYTHONIOENCODING=UTF-8",
                    "LC_ALL=C.UTF-8",
                    "TZ=UTC",
                    "PYTHONUSERBASE=/app",
                    "QUAYPATH=/quay-registry",
                    "QUAYCONF=/quay-registry/conf",
                    "PATH=/app/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                    "QUAYRUN=/quay-registry/conf",
                    "PYTHONPATH=/quay-registry",
                    "HOME=/quay-registry",
                    "HOSTNAME=quay-pod"
               ],
               "Cmd": [
                    "registry"
               ],
               "Image": "registry.redhat.io/quay/quay-rhel8:v3.8.14",
               "Volumes": null,
               "WorkingDir": "/quay-registry",
               "Entrypoint": "dumb-init -- /quay-registry/quay-entrypoint.sh",
               "OnBuild": null,
               "Labels": null,
               "Annotations": {
                    "io.container.manager": "libpod",
                    "io.kubernetes.cri-o.SandboxID": "db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec",
                    "io.podman.annotations.cid-file": "/run/quay-app.service-cid",
                    "org.opencontainers.image.stopSignal": "15"
               },
               "StopSignal": 15,
               "HealthcheckOnFailureAction": "none",
               "CreateCommand": [
                    "/usr/bin/podman",
                    "run",
                    "--name",
                    "quay-app",
                    "-v",
                    "/opt/quay/config/quay-config:/quay-registry/conf/stack:Z",
                    "-v",
                    "/opt/quay/data:/datastorage:Z",
                    "--pod=quay-pod",
                    "--conmon-pidfile",
                    "/run/quay-app.service-pid",
                    "--cidfile",
                    "/run/quay-app.service-cid",
                    "--cgroups=no-conmon",
                    "--replace",
                    "registry.redhat.io/quay/quay-rhel8:v3.8.14"
               ],
               "Umask": "0022",
               "Timeout": 0,
               "StopTimeout": 10,
               "Passwd": true,
               "sdNotifyMode": "container"
          },
          "HostConfig": {
               "Binds": [
                    "f19507ef7f837c63cb92f116e042f12daa4c00a0c37c444cb1c7988687e66a0d:/tmp:rprivate,rw,nodev,exec,nosuid,rbind",
                    "63e0413f366aa2f74f9370d04014e48038006bb4cf1b2ff5435fc9cb724de3ce:/var/log:rprivate,rw,nodev,exec,nosuid,rbind",
                    "097a7e8bf2e6d0a80a575d14bd6bdfa58d16919ff83a9b403d6dc06915ae20bc:/conf/stack:rprivate,rw,nodev,exec,nosuid,rbind",
                    "/opt/quay/config/quay-config:/quay-registry/conf/stack:rw,rprivate,rbind",
                    "/opt/quay/data:/datastorage:rw,rprivate,rbind"
               ],
               "CgroupManager": "systemd",
               "CgroupMode": "private",
               "ContainerIDFile": "/run/quay-app.service-cid",
               "LogConfig": {
                    "Type": "journald",
                    "Config": null,
                    "Path": "",
                    "Tag": "",
                    "Size": "0B"
               },
               "NetworkMode": "container:db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec",
               "PortBindings": {},
               "RestartPolicy": {
                    "Name": "",
                    "MaximumRetryCount": 0
               },
               "AutoRemove": false,
               "VolumeDriver": "",
               "VolumesFrom": null,
               "CapAdd": [],
               "CapDrop": [],
               "Dns": [],
               "DnsOptions": [],
               "DnsSearch": [],
               "ExtraHosts": [],
               "GroupAdd": [],
               "IpcMode": "container:db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec",
               "Cgroup": "",
               "Cgroups": "default",
               "Links": null,
               "OomScoreAdj": 0,
               "PidMode": "private",
               "Privileged": false,
               "PublishAllPorts": false,
               "ReadonlyRootfs": false,
               "SecurityOpt": [],
               "Tmpfs": {},
               "UTSMode": "container:db266da38b9c0ffd99a27f0873934a79cbf7776dd8996aa0e4b839f98f0b25ec",
               "UsernsMode": "",
               "ShmSize": 65536000,
               "Runtime": "oci",
               "ConsoleSize": [
                    0,
                    0
               ],
               "Isolation": "",
               "CpuShares": 0,
               "Memory": 0,
               "NanoCpus": 0,
               "CgroupParent": "machine.slice/machine-libpod_pod_5e70ee01733b02f854d79d85dd78dc5c8ecdb2c50de7472a314441897f9296dc.slice",
               "BlkioWeight": 0,
               "BlkioWeightDevice": null,
               "BlkioDeviceReadBps": null,
               "BlkioDeviceWriteBps": null,
               "BlkioDeviceReadIOps": null,
               "BlkioDeviceWriteIOps": null,
               "CpuPeriod": 0,
               "CpuQuota": 0,
               "CpuRealtimePeriod": 0,
               "CpuRealtimeRuntime": 0,
               "CpusetCpus": "",
               "CpusetMems": "",
               "Devices": [],
               "DiskQuota": 0,
               "KernelMemory": 0,
               "MemoryReservation": 0,
               "MemorySwap": 0,
               "MemorySwappiness": 0,
               "OomKillDisable": false,
               "PidsLimit": 2048,
               "Ulimits": [
                    {
                         "Name": "RLIMIT_NPROC",
                         "Soft": 4194304,
                         "Hard": 4194304
                    }
               ],
               "CpuCount": 0,
               "CpuPercent": 0,
               "IOMaximumIOps": 0,
               "IOMaximumBandwidth": 0,
               "CgroupConf": null
          }
     }
]
@BadgerOps
Copy link
Contributor

Hey team, we just ran into this same exact issue, same symptoms as well. I thought perhaps we just had a one-off issue, but then noticed this issue, so I thought I'd add a comment. I'll get some troubleshooting logs posted here. I can connect via netcat to port 8443 and have ruled out selinux, fapolicyd, etc as potential contributors.

It just.... stops responding to http traffic.

@BadgerOps
Copy link
Contributor

I should have captured the output, but failed to - I did notice that a curl results in something similar to the following:

 curl -vvv https://<quay-server>:8443 | head
* Rebuilt URL to: https://<quay-server>:8443/

* TCP_NODELAY set
* Connected to <quay-server> port 8443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
< hangs right here where we should get a Server hello>

We never get the server hello back, nor anything beyond that - and, as noted above the port is open and responds via nc and the logs keep on rolling by for journalctl -fu quay-app.service or podman logs -f <pod_id>

@daviesow
Copy link

daviesow commented Jan 13, 2025

Anyone find a solution to this issue? We had a newly installed mirror registry that was working just fine for a week or two suddenly start exhibiting this behavior. We've ruled out all sorts of networking issues, time sync, disk and permission issues, etc. Everything seems perfectly healthy and functional except that the quay-app makes no reply to any communication. There is a window of a few seconds about 15 seconds after a quay-app restart where it will respond normally but then it just stops. No more log entries from nginx at all and no errors of any sort but no replies either.

For a moment we thought we'd solved it by running a mirror-registry upgrade. We initially had the exact container versions listed in SalaryTheft's original post and noticed the current mirror registry made some major changes like swapping out pgsql for sqlite. So we ran the upgrade and the mirror-registry became responsive and stayed that way overnight. However, it didn't survive a service restart. We are now on these container versions and back to a completely healthy but unresponsive registry:

[root@quay ~]# podman ps
CONTAINER ID  IMAGE                                         COMMAND    CREATED     STATUS     PORTS                   NAMES
57b7dc9be481  registry.access.redhat.com/ubi8/pause:8.10-5  infinity   2 days ago  Up 2 days  0.0.0.0:8443->8443/tcp  2c42a7e286bb-infra
1fe234b82c13  registry.redhat.io/rhel8/redis-6:1-190        run-redis  2 days ago  Up 2 days  0.0.0.0:8443->8443/tcp  quay-redis
02800cc3e2fd  registry.redhat.io/quay/quay-rhel8:v3.12.3    registry   2 days ago  Up 2 days  0.0.0.0:8443->8443/tcp  quay-app

Any hint or workaround would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants