Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubuntu 24.04 images fail to build #3189

Open
dillona opened this issue Nov 14, 2024 · 15 comments · May be fixed by #3200
Open

Ubuntu 24.04 images fail to build #3189

dillona opened this issue Nov 14, 2024 · 15 comments · May be fixed by #3200
Labels

Comments

@dillona
Copy link

dillona commented Nov 14, 2024

mkosi commit the issue has been seen with

653adbe

Used host distribution

Ubuntu 24.04

Used target distribution

Ubuntu 24.04

Linux kernel version used

6.8.0-1016-aws

CPU architectures issue was seen on

None

Unexpected behaviour you saw

Ubuntu 24.04 images fail to build.

In #16 dbus and libpam-systemd were added to the debootstrap invocation, however in #1442 this was effectively removed with the suggestion to add these packages manually to the Packages list.

This does not seem to work. If I edit mkosi/distributions/debian.py to add those packages to the debootstrap-equivalent installation, my build completes successfully.

Used mkosi config

[Distribution]
Distribution=ubuntu
Release=noble
Architecture=x86-64
Repositories=universe

[Content]
Bootable=no

[Output]
Format=tar
CompressOutput=zstd

[Content]
Bootable=no
Packages=ubuntu-desktop,dbus,libpam-systemd
WithRecommends=yes

mkosi output

[ beginning omitted to fit within Github limits ]
Setting up libdbus-1-3:amd64 (1.14.10-4ubuntu4.1) ...
Setting up dbus-bin (1.14.10-4ubuntu4.1) ...
Setting up dbus-session-bus-common (1.14.10-4ubuntu4.1) ...
Setting up dbus-daemon (1.14.10-4ubuntu4.1) ...
/usr/lib/tmpfiles.d/dbus.conf:13: Failed to resolve user 'messagebus': No such process
Setting up dbus-system-bus-common (1.14.10-4ubuntu4.1) ...
Setting up dbus (1.14.10-4ubuntu4.1) ...
Setting up systemd-sysv (255.4-1ubuntu8.4) ...
Setting up libpam-systemd:amd64 (255.4-1ubuntu8.4) ...
Setting up dbus-user-session (1.14.10-4ubuntu4.1) ...
Setting up snapd (2.65.3+24.04) ...
Created symlink /etc/systemd/system/multi-user.target.wants/snapd.apparmor.service → /usr/lib/systemd/system/snapd.apparmor.service.
Created symlink /etc/systemd/system/multi-user.target.wants/snapd.autoimport.service → /usr/lib/systemd/system/snapd.autoimport.service.
Created symlink /etc/systemd/system/multi-user.target.wants/snapd.core-fixup.service → /usr/lib/systemd/system/snapd.core-fixup.service.
Created symlink /etc/systemd/system/multi-user.target.wants/snapd.recovery-chooser-trigger.service → /usr/lib/systemd/system/snapd.recovery-chooser-trigger.service.
Created symlink /etc/systemd/system/multi-user.target.wants/snapd.seeded.service → /usr/lib/systemd/system/snapd.seeded.service.
Created symlink /etc/systemd/system/cloud-final.service.wants/snapd.seeded.service → /usr/lib/systemd/system/snapd.seeded.service.
Unit /usr/lib/systemd/system/snapd.seeded.service is added as a dependency to a non-existent unit cloud-final.service.
Created symlink /etc/systemd/system/multi-user.target.wants/snapd.service → /usr/lib/systemd/system/snapd.service.
Created symlink /etc/systemd/system/timers.target.wants/snapd.snap-repair.timer → /usr/lib/systemd/system/snapd.snap-repair.timer.
Created symlink /etc/systemd/system/sockets.target.wants/snapd.socket → /usr/lib/systemd/system/snapd.socket.
Created symlink /etc/systemd/system/final.target.wants/snapd.system-shutdown.service → /usr/lib/systemd/system/snapd.system-shutdown.service.
dpkg: unrecoverable fatal error, aborting:
 unknown system group 'messagebus' in statoverride file; the system group got removed
before the override, which is most probably a packaging bug, to recover you
can remove the override manually with dpkg-statoverride
E: Sub-process /usr/bin/dpkg returned an error code (2)
‣ "/usr/bin/python3 -SI /home/ubuntu/mkosi/mkosi/sandbox.py --proc /proc --unsetenv TMPDIR --setenv SYSTEMD_OFFLINE 1 --ro-bind /usr /usr --symlink usr/bin /bin --symlink usr/sbin /sbin --symlink usr/lib /lib --symlink usr/lib64 /lib64 --ro-bind /etc/alternatives /etc/alternatives --ro-bind /etc/ld.so.cache /etc/ld.so.cache --dir /var/tmp --dir /var/log --unshare-ipc --ro-bind /home/ubuntu/mkosi/mkosi/sandbox.py /sandbox.py --dev /dev --ro-bind /etc/resolv.conf /etc/resolv.conf --setenv PATH /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin --ro-bind /var/tmp/mkosi-workspace-6pf2536_/sandbox/etc /etc --dir /opt --bind /var/tmp/mkosi-workspace-6pf2536_/tmp/mkosi-var-tmp-474c0b06b3b543e1 /srv --bind /var/tmp/mkosi-workspace-6pf2536_/tmp/mkosi-var-tmp-7bdc071fb6674ed2 /media --bind /var/tmp/mkosi-workspace-6pf2536_/tmp/mkosi-var-tmp-21921dc39bae40d1 /mnt --bind /var/tmp/mkosi-workspace-6pf2536_/tmp/mkosi-var-tmp-3463cb4047fe47b3 /var --dir /run --dir /tmp --bind /var/tmp/mkosi-workspace-6pf2536_/tmp/mkosi-var-tmp-ef7778da97d3482e /var/tmp --bind /var/tmp/mkosi-workspace-6pf2536_/root /buildroot --ro-bind /etc/pki /etc/pki --ro-bind /etc/ssl /etc/ssl --bind /var/tmp/mkosi-workspace-6pf2536_/repository /repository --bind /var/tmp/mkosi-metadata-_2u6061s/cache/apt /var/cache/apt --bind '/var/cache/mkosi/ubuntu~noble~x86-64/cache/apt/archives' /var/cache/apt/archives --bind /var/tmp/mkosi-metadata-_2u6061s/lib/apt /var/lib/apt --tmpfs /buildroot/run --tmpfs /buildroot/tmp --proc /buildroot/proc --dev /buildroot/dev --dir /buildroot/run/user/0 --write mkosi /buildroot/run/host/container-manager --become-root --suppress-chown --ro-bind-try /var/tmp/mkosi-workspace-6pf2536_/root/etc/machine-id /buildroot/etc/machine-id --ro-bind-try /var/tmp/mkosi-workspace-6pf2536_/root/etc/passwd /etc/passwd --ro-bind-try /var/tmp/mkosi-workspace-6pf2536_/root/etc/group /etc/group --ro-bind-try /var/tmp/mkosi-workspace-6pf2536_/root/etc/shadow /etc/shadow --ro-bind-try /var/tmp/mkosi-workspace-6pf2536_/root/etc/gshadow /etc/gshadow --ro-bind /etc/ssl/certs/ca-certificates.crt /proxy.cacert -- apt-get -o APT::Architecture=amd64 -o APT::Architectures=amd64 -o APT::Install-Recommends=true -o APT::Immediate-Configure=off -o APT::Get::Assume-Yes=true -o APT::Get::AutomaticRemove=true -o APT::Get::Allow-Change-Held-Packages=true -o APT::Get::Allow-Remove-Essential=true -o APT::Sandbox::User=root -o Acquire::AllowReleaseInfoChange=true -o Dir::Cache=/var/cache/apt -o Dir::State=/var/lib/apt -o Dir::Log=/var/log/apt -o Dir::State::Status=/buildroot/var/lib/dpkg/status -o Dir::Bin::DPkg=/usr/bin/dpkg -o Debug::NoLocking=true -o DPkg::Options::=--root=/buildroot -o DPkg::Options::=--force-unsafe-io -o DPkg::Options::=--force-architecture -o DPkg::Options::=--force-depends -o DPkg::Options::=--no-debsig -o DPkg::Use-Pty=false -o DPkg::Install::Recursive::Minimum=1000 -o pkgCacheGen::ForceEssential=, install ubuntu-desktop dbus libpam-systemd" returned non-zero exit code 100.
‣ + rm -rf -- /work/var/tmp/mkosi-workspace-6pf2536_
Traceback (most recent call last):
  File "/home/ubuntu/mkosi/mkosi/run.py", line 62, in uncaught_exception_handler
    yield
  File "/home/ubuntu/mkosi/mkosi/run.py", line 103, in fork_and_wait
    target(*args, **kwargs)
  File "/home/ubuntu/mkosi/mkosi/__init__.py", line 4520, in run_build
    build_image(
  File "/home/ubuntu/mkosi/mkosi/__init__.py", line 3629, in build_image
    install_distribution(context)
  File "/home/ubuntu/mkosi/mkosi/__init__.py", line 243, in install_distribution
    context.config.distribution.install_packages(context, context.config.packages)
  File "/home/ubuntu/mkosi/mkosi/distributions/__init__.py", line 133, in install_packages
    return self.installer().install_packages(context, packages)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/mkosi/mkosi/distributions/debian.py", line 208, in install_packages
    Apt.invoke(context, "install", packages, apivfs=apivfs)
  File "/home/ubuntu/mkosi/mkosi/installer/apt.py", line 221, in invoke
    return run(
           ^^^^
  File "/home/ubuntu/mkosi/mkosi/run.py", line 150, in run
    with spawn(
  File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
    next(self.gen)
  File "/home/ubuntu/mkosi/mkosi/run.py", line 295, in spawn
    raise subprocess.CalledProcessError(returncode, cmdline)
subprocess.CalledProcessError: Command '['apt-get', '-o', 'APT::Architecture=amd64', '-o', 'APT::Architectures=amd64', '-o', 'APT::Install-Recommends=true', '-o', 'APT::Immediate-Configure=off', '-o', 'APT::Get::Assume-Yes=true', '-o', 'APT::Get::AutomaticRemove=true', '-o', 'APT::Get::Allow-Change-Held-Packages=true', '-o', 'APT::Get::Allow-Remove-Essential=true', '-o', 'APT::Sandbox::User=root', '-o', 'Acquire::AllowReleaseInfoChange=true', '-o', 'Dir::Cache=/var/cache/apt', '-o', 'Dir::State=/var/lib/apt', '-o', 'Dir::Log=/var/log/apt', '-o', 'Dir::State::Status=/buildroot/var/lib/dpkg/status', '-o', 'Dir::Bin::DPkg=/usr/bin/dpkg', '-o', 'Debug::NoLocking=true', '-o', 'DPkg::Options::=--root=/buildroot', '-o', 'DPkg::Options::=--force-unsafe-io', '-o', 'DPkg::Options::=--force-architecture', '-o', 'DPkg::Options::=--force-depends', '-o', 'DPkg::Options::=--no-debsig', '-o', 'DPkg::Use-Pty=false', '-o', 'DPkg::Install::Recursive::Minimum=1000', '-o', 'pkgCacheGen::ForceEssential=,', 'install', 'ubuntu-desktop', 'dbus', 'libpam-systemd']' returned non-zero exit status 100.
‣ + tput cnorm
‣ + tput smam
@DaanDeMeyer
Copy link
Contributor

So the order that packages are set up in is the following:

Setting up libdbus-1-3:amd64 (1.14.10-4ubuntu4.1) ...
Setting up dbus-bin (1.14.10-4ubuntu4.1) ...
Setting up dbus-session-bus-common (1.14.10-4ubuntu4.1) ...
Setting up dbus-daemon (1.14.10-4ubuntu4.1) ...
/usr/lib/tmpfiles.d/dbus.conf:13: Failed to resolve user 'messagebus': No such process
Setting up dbus-system-bus-common (1.14.10-4ubuntu4.1) ...
Setting up dbus (1.14.10-4ubuntu4.1) ...
Setting up systemd-sysv (255.4-1ubuntu8.4) ...
Setting up libpam-systemd:amd64 (255.4-1ubuntu8.4) ...
Setting up dbus-user-session (1.14.10-4ubuntu4.1) ...

And according to https://salsa.debian.org/utopia-team/dbus/-/blob/debian/unstable/debian/dbus-system-bus-common.postinst, the messagebus user is added when setting up dbus-system-bus-common, which we see happens after setting up dbus-daemon, so this seems like a packaging bug, not a mkosi bug.

cc @bluca @smcv I'm not sure if this is already fixed in Debian and just needs to be backported to Ubuntu or if this could happen in Debian as well.

@bluca
Copy link
Member

bluca commented Nov 15, 2024

It's the same everywhere, the dependency setup is complex in order to allow dbus-broker to exist - @smcv did we leave out dbus-daemon -> dbus-system-bus-common for a particular reason? It ships the common config/socket so it's something we always want to pull in? Or was it to allow dbus-daemon to purely provide a user session without affecting the system?

@smcv
Copy link

smcv commented Nov 15, 2024

Running the maintainer script of the package that contains /usr/lib/tmpfiles.d/dbus.conf (dbus-daemon) before the package that creates the messagebus user (dbus-system-bus-common) is not ideal, but the use of messagebus in /usr/lib/tmpfiles.d/dbus.conf is non-critical (and in fact it's for a feature that is forced to be compile-time-disabled) so that shouldn't cause any effects that are worse than a warning. I should just delete the d /run/dbus/containers 0755 messagebus line from the tmpfiles.d snippet for now, which would silence that warning (and then we can bring it back at some point in the future when the Containers1 feature is reviewed and stabilized).

The actual failure here is that dpkg-statoverride also can't resolve the messagebus group, and that seems wrong, because dpkg-statoverride is called from dbus.postinst, but the messagebus user and group were created by dbus-system-bus-common.postinst, which runs before dbus.postinst.

To me, the error message No such process (which is strerror(ESRCH)) smells like a problem with a nsswitch plugin - maybe libnss_systemd, or maybe some unrelated nsswitch plugin pulled in by ubuntu-desktop? - causing user/group lookup to "fail hard", in a way that does not fall back to libnss_files and /etc/group, which would find the messagebus group created by adduser?

@DaanDeMeyer
Copy link
Contributor

@smcv Ah I wasn't aware that dpkg actually used NSS plugins at all. That's rather unfortunate. In that case your guess is very likely to be right, let me debug some more and figure out which NSS plugin might be causing the issue.

@bluca
Copy link
Member

bluca commented Nov 15, 2024

Even if it's a false positive, should we move /usr/lib/tmpfiles.d/dbus.conf to dbus-system-bus-common given that's where the sysuser.d and other common meta stuff resides? It then becomes easy to order them

@smcv
Copy link

smcv commented Nov 15, 2024

did we leave out dbus-daemon -> dbus-system-bus-common for a particular reason? It ships the common config/socket so it's something we always want to pull in? Or was it to allow dbus-daemon to purely provide a user session without affecting the system?

Yes, dbus-daemon intentionally does not depend on dbus-system-bus-common because dbus-daemon is enough to provide the session/user bus (via systemd/user/dbus.{service,socket} and/or dbus-run-session and/or dbus-launch), and does not have any particular dependency on system-level components.

The fact that the tmpfiles.d snippet in dbus-daemon wants the messagebus group to exist is a genuine bug, but only a minor one - it logs a warning but should carry on with no further ill effects. It's the subsequent run of dpkg-statoverride that is the actual failure here.

@smcv
Copy link

smcv commented Nov 15, 2024

Even if it's a false positive, should we move /usr/lib/tmpfiles.d/dbus.conf to dbus-system-bus-common given that's where the sysuser.d and other common meta stuff resides?

No, the majority of it should stay in dbus-daemon, because dbus-daemon --session wants to ensure that /var/lib/dbus/machine-id exists.

The last line (the one that creates /run/dbus/containers) could in principle be moved to a separate tmpfiles.d snippet in dbus-system-bus-common, but it doesn't serve any practical purpose with current versions of dbus, so just deleting that line would be a better solution.

@smcv
Copy link

smcv commented Nov 15, 2024

I wasn't aware that dpkg actually used NSS plugins at all

Everything that interacts with the user database in the canonical way (getpwuid() and similar functions) uses NSS plugins. The only way to not use NSS plugins would be to go behind glibc's back and access /etc/passwd and /etc/group directly, which is valid for specifically passwd-file-based tools like useradd and adduser, but is discouraged for components like dbus-daemon or dpkg that just want to look up a user/group and don't (want to) know or care how users and groups are implemented on this specific system.

@smcv
Copy link

smcv commented Nov 15, 2024

dpkg: unrecoverable fatal error, aborting:
unknown system group 'messagebus' in statoverride file; the system group got removed
before the override, which is most probably a packaging bug, to recover you
can remove the override manually with dpkg-statoverride

Another possible reason for this error message to be seen is if a chroot-like tool copies in /etc/passwd and /etc/group from the host system (which might not know messagebus), discarding previous edits to those files that created messagebus.

I don't think that's what's happening here, because even if mkosi was copying in those files, it seems to be doing all of its apt/dpkg operations in a single transaction, so there would be no opportunity for edits to be reverted. But it has been seen in some use-cases of schroot, which is a sort-of-almost-container-manager that is used in various Debian contexts for historical reasons.

@smcv
Copy link

smcv commented Nov 15, 2024

Actually... I notice this in the log:

--ro-bind-try /var/tmp/mkosi-workspace-6pf2536_/root/etc/passwd /etc/passwd --ro-bind-try /var/tmp/mkosi-workspace-6pf2536_/root/etc/group /etc/group

If /etc/passwd and /etc/group are being bind-mounted into the working area read-only, then the invocation of adduser and/or systemd-sysusers in dbus-system-bus-common [edit: it's actually adduser] will definitely not be able to create a messagebus user or group!

If that's what is happening here, I would really have expected adduser to fail with an error message, causing dbus-system-bus-common.postinst to fail, instead of continuing regardless and then letting dpkg-statoverride fail later.

But, if dpkg-statoverride is not seeing messagebus in /etc/group, when it has been asked to set a file to be owned by messagebus, then it's certainly legitimate for it to fail.

@DaanDeMeyer
Copy link
Contributor

DaanDeMeyer commented Nov 15, 2024

@smcv We run dpkg with --root= so it should definitely not be messing with /etc/passwd but with <root>/etc/passwd.

What I wonder though, if dpkg-statoverride is reading /etc/passwd, but adduser is writing to <root>/etc/passwd and replacing the file instead of modifying it, then the changes to the file won't be reflected in the bind mount to /etc/passwd which would cause dpkg-statoverride to fail.

@DaanDeMeyer
Copy link
Contributor

It turns out my patch to have dpkg look up users in the root directory instead of on the host was never merged: https://lists.debian.org/debian-dpkg/2023/04/msg00002.html.

So dpkg-statoverride is indeed still looking up users on the host so if my suspicion that adduser does an atomic replace of /etc/passwd is right then that would explain the failure.

@smcv
Copy link

smcv commented Nov 15, 2024

We run dpkg with --root= so it should definitely not be messing with /etc/passwd but with <root>/etc/passwd

It will usually chroot into the given root to run maintainer scripts.

If you use dpkg --force-script-chrootless, it will run the maintainer scripts with a non-empty $DPKG_ROOT, but that's not yet fully implemented (work is ongoing, but you have to assume that packages do not support that mechanism unless you specifically know that a particular package does support it). Good luck!

dbus handles $DPKG_ROOT where it was straightforward to do so, but has not been fully audited for that. Patches welcome.

Similarly, maintscript fragments generated by debhelper mostly handle $DPKG_ROOT, but they might not all handle it.

dbus-system-bus-common.postinst attempts to chroot into $DPKG_ROOT to run adduser, but I don't know how well that works in practice. Patches welcome.

If you're generating a chroot for a relatively well-known package-set that is not too much of a moving target, like "all of an Ubuntu 24.04 desktop", pre-preparing an /etc/passwd and /etc/group with all of the required users and groups might be wise.

@DaanDeMeyer
Copy link
Contributor

Using symlinks instead of bind mounts fixes the issue, so I think my assumption was correct.

DaanDeMeyer added a commit to DaanDeMeyer/mkosi that referenced this issue Nov 15, 2024
Bind mounts don't reflect changes to the original files if they're
replaced instead of modified. Let's use symlinks instead so that
changes to the original files are always reflected.

Fixes systemd#3189
@DaanDeMeyer DaanDeMeyer linked a pull request Nov 15, 2024 that will close this issue
@DaanDeMeyer
Copy link
Contributor

The linked PR switches us over to symlinks instead of bind mounts which fixes the issue.

DaanDeMeyer added a commit to DaanDeMeyer/mkosi that referenced this issue Nov 15, 2024
Bind mounts don't reflect changes to the original files if they're
replaced instead of modified. Let's use symlinks instead so that
changes to the original files are always reflected.

Fixes systemd#3189
DaanDeMeyer added a commit to DaanDeMeyer/mkosi that referenced this issue Nov 16, 2024
Bind mounts don't reflect changes to the original files if they're
replaced instead of modified. Let's use symlinks instead so that
changes to the original files are always reflected.

Fixes systemd#3189
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 participants