Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: lsof regression in amazonlinux:2023 ? #123

Open
eddelbuettel opened this issue Jan 13, 2025 · 2 comments
Open

[Question]: lsof regression in amazonlinux:2023 ? #123

eddelbuettel opened this issue Jan 13, 2025 · 2 comments
Labels

Comments

@eddelbuettel
Copy link

Product

Amazon Linux 2023

What is your question?

lsof is used to determine if a to-be-launched service is running already.

So a launcher script tries lsof -i :5000. This now hangs even in a minimal amazonlinux:2023 container running under Ubuntu 24.10 unless we also run ulimit -Sn 1024 (or some other larger number).

The fact that it hangs stalls use of depoyment of PrairieLearn which has relied on AL for several years. I am CCing @nwalters512 who heads the tech side of things there and helped me debug.

We also verified that e.g. on standard debian:12 or ubuntu:latest container, lsof -i :5000 runs fine (returning immediately). But somehow amazonlinux:2023 needs the ulimit call.

@nwalters512
Copy link

nwalters512 commented Jan 13, 2025

I'll include a minimal reproduction below. I don't have an Ubuntu installation to test this against personally, but I worked closely with @eddelbuettel to verify that this does indeed fail on his machine running Ubuntu 24.10:

docker run --rm -it amazonlinux:2023 /bin/bash
# Inside the container:
dnf install -y lsof
lsof

The final command hangs. This was not reproducible with Docker Desktop running on Apple Silicon macOS.

Interestingly, this is reproducible for me (Docker Desktop, Apple Silicon, macOS) if I run lsof under strace:

docker run --rm -it amazonlinux:2023 /bin/bash
# Inside the container:
dnf install -y lsof strace
strace lsof

That produces the following output (truncated at start and finish):

...
openat(AT_FDCWD, "/proc/filesystems", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(3, "nodev\tsysfs\nnodev\ttmpfs\nnodev\tbd"..., 1024) = 577
read(3, "", 1024)                       = 0
close(3)                                = 0
faccessat(AT_FDCWD, "/etc/selinux/config", F_OK) = -1 ENOENT (No such file or directory)
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
close(3)                                = -1 EBADF (Bad file descriptor)
close(4)                                = -1 EBADF (Bad file descriptor)
close(5)                                = -1 EBADF (Bad file descriptor)
close(6)                                = -1 EBADF (Bad file descriptor)
close(7)                                = -1 EBADF (Bad file descriptor)
close(8)                                = -1 EBADF (Bad file descriptor)
close(9)                                = -1 EBADF (Bad file descriptor)
close(10)                               = -1 EBADF (Bad file descriptor)
close(11)                               = -1 EBADF (Bad file descriptor)
close(12)                               = -1 EBADF (Bad file descriptor)
close(13)                               = -1 EBADF (Bad file descriptor)
close(14)                               = -1 EBADF (Bad file descriptor)
...

It keeps trying to close every file descriptor in sequence. I let it run for a while and it got into the hundreds of thousands before I killed the process.

For both @eddelbuettel and myself, running ulimit -Sn 1024 makes lsof complete quickly, regardless of whether it's run under strace or not.

Both @eddelbuettel and I tried reproducing with a plain ubuntu:latest image:

docker run --rm -it ubuntu:latest /bin/bash
# Inside the container:
apt-get update
apt-get install -y lsof strace
strace lsof

There is no endless loop of close(...) calls and it completes quickly, as expected.

@eddelbuettel
Copy link
Author

For completeness the kernel running under this (vanilla) Ubuntu 24.10 instance is

edd@rob:~$ uname -a
Linux rob 6.11.0-13-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Sat Nov 30 23:51:51 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
edd@rob:~$ 

eddelbuettel added a commit to eddelbuettel/PrairieLearn that referenced this issue Jan 14, 2025
eddelbuettel added a commit to eddelbuettel/PrairieLearn that referenced this issue Jan 14, 2025
github-merge-queue bot pushed a commit to PrairieLearn/PrairieLearn that referenced this issue Jan 14, 2025
…10 (#11166)

* Ubuntu 24.10 lets lsof loop endlessly, adding ulimit help

Cf issue amazonlinux/container-images#123

* Update docker/start_s3rver.sh

---------

Co-authored-by: Nathan Sarang-Walters <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants