Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container can't mount host's /mnt/boot #2371

Open
shaunco opened this issue Sep 3, 2024 · 12 comments · May be fixed by #2372
Open

Container can't mount host's /mnt/boot #2371

shaunco opened this issue Sep 3, 2024 · 12 comments · May be fixed by #2372

Comments

@shaunco
Copy link

shaunco commented Sep 3, 2024

Our balena devices are deployed in customer environments, and often the customer needs to reconfigure network settings, proxies, udev entries (and associated sh scripts), and other stuff that is in the boot partition. Given that this is all on internal flash memory, it is nearly impossible for the customer to do. This means configuring a new image, sending it to the customer to use balena Etcher, and having them reflash the box - an all around terrible experience. When I first brought this up, the Balena team pointed me at https://github.com/balena-os/wifi-connect , but it is incredibly limited in its abilities. We looked at the Supervisor API, but it can't configure Network Manager settings and can't put shell scripts into the boot partition for udev rules to run.

To work around all this, we've created an SSH hosted shell app that allows for configuration of all of these items, but can't run it as a container on balenaOS because it needs a bind mount to /mnt/boot (in order to edit those files), but unfortunately "Bind mounts are not allowed" gets thrown during balena push, and based on this from @alexgg:

We don't plan to allow the mounting or arbitrary host directories in containers, and to date there is no validated use case that requires this.

I see that the supervisor can inject specific mounts at

export async function addFeaturesFromLabels(
... but there is no support for something like io.balena.features.bootfs. Maybe I could do this in the docker-compose:

    devices:
      - "/dev/sda1:/dev/sda1"

but that seems gross, and I'm not even sure every possible Balena image has /mnt/boot at /dev/sda1.

Am I missing something here or would the Balena team be open to a PR that provides an io.balena.features.bootfs entry to mount /mnt/boot into a container?

@shaunco
Copy link
Author

shaunco commented Sep 3, 2024

Decided I won't try this, as it has way too much possibility of volume corruption:

devices:
      - "/dev/sda1:/dev/sda1"

I can make a second container that has io.balena.features.balena-socket access, which uses it to monitor for the first container, stop it, add the bind mount, and restart it ... but, also, seems terrible.

@shaunco
Copy link
Author

shaunco commented Sep 4, 2024

or maybe I just give my container privileged: true and use this script?
https://github.com/balena-os/balena-supervisor/blob/master/mount-partitions.sh

@cywang117
Copy link
Contributor

@shaunco , mount-partitions.sh is indeed what the Supervisor uses to mount host partitions into the container, and using privileged: true in combination with the methods in the script will achieve what you're looking for. However, this interface isn't public for a reason, as it is not guaranteed to remain the same. Furthermore, modifying network, udev, etc in the host carries a risk of making the device unreachable, which I'm sure you're aware of. With this knowledge, please proceed with caution.

@shaunco
Copy link
Author

shaunco commented Sep 4, 2024

Thanks @cywang117! The script works, for now, but I'd really prefer something like the PR I submitted ( #2372 ).

I'm aware of the dangers, but it is unrealistic to have customers ship devices back to us or for us to configure and send brand new images they can write to a thumbdrive with Etcher for a full reflash when they need minor network changes (static<->dynamic, DNS changes, an 802.1x cert needing updated, new SSL MITM proxy, disabling 8.8.8.8, redsocks config, wifi settings, etc). With this SSH accessible configuration app, we've given them the ability to reconfigure specific parameters as long as they can get a laptop on the same network. Next up is figuring out how to expose it over tty0 or ttyACM0.

@cywang117
Copy link
Contributor

Hey @shaunco ! Thanks for taking the time to submit a PR. Unfortunately we have a strict process for evaluating the addition of interfaces to the Supervisor, as Supervisors have to be backwards-compatible with both old and new OS versions. What this means is that each feature added also becomes part of the support burden for future OS versions.

You listed quite a few pieces of functionality. Redsocks config for example can be handled through the Supervisor API so doesn't require the boot mount, but for ther others, I see why you'd want to access to the boot mount for them, as we don't have runtime configuration options for them. For the NetworkManager configs though, they can be managed from within the container with DBus, which we support as a label.

Perhaps if you have a support account, you can raise individual features you'd be interested in that we don't support?

@shaunco
Copy link
Author

shaunco commented Sep 5, 2024

Seems like a strange position to instruct balena users to modify all these files in the boot partition prior to provisioning/flashing a device, but then give limited to no ability to edit them via console/ssh/API/container. Either we are the admins of the device and should be able to configure these files at will or we are not.

If allowing a bind mount of /mnt/boot via a docker-compose label is not backwards/forward compatible, where /mnt/boot could be any path supervisor wishes, as long as it gets mounted, then I have many many more questions about the future of balenaOS.

We requested these features through our paid support and our customer success lead over a year ago (8/24/2023), along with the request for ANY form of local configuration (SSH, console, etc) that didn't require reflashing the box. The response we got was:

For the in-container solution, our recommendation is to fork the wifi-connect project or this one, and modify it as per your convenience. For user input the container can use the GUI or one of the text terminals, both are available.

... so, that is pretty much what we've done. It just turns out we can't actually edit most of the persistent config that we see break customer installs over and over and over again, so I opened an issue and submitted a 1 line PR that fixes the issue.

As for configuration via Supervisor API / DBus...

Redsocks

Supervisor API exposes an extremely limited set of redsocks config capabilities, any just finally added dnsu2t support a week ago. There is still no support for many base block options (user, group, chroot, tcp_keepalive_time, tcp_keepalive_probes, tcp_keepalive_intvl, rlimit_nofile, redsocks_conn_max, connpres_idle_timeout, max_accept_backoff), all of redudp, all of dnstc, or many redsocks block options (on_proxy_fail, disclose_src, splice, listenq)

Network Manager

The DBus API for NetMgr writes to /etc/NetworkManager/system-connections, which is overwritten at each boot with the contents of system-connections on the boot partition (as documented here and in src here - and makes sense, where a field engineer might edit a NetMgr config file in the boot partition and reboot, and it needs to land back where NetMgr expects). Additionally, https://docs.balena.io/reference/OS/network/ has a big yellow box at the top that says:

Note: When editing files for connectivity in the boot partition, make sure you're in the /mnt/boot/system-connections folder so that they persist after reboot.

config.json

Supervisor API has the best support here, but we often attach bash scripts to udev rules:

UDEV_RULE: SUBSYSTEM=="net", ACTION=="add", KERNEL=="eth1", RUN+="/mnt/boot/some_script.sh"

which can't be done through the Supervisor API. We use this method to populate /etc/hosts at boot on boxes deployed to networks that have no DNS server (weirdly WAY more common that I would have guessed). The script in /mnt/boot contains static host/IP entries used to modify /etc/hosts appropriately if dns queries are still failing at interface-up. We would have preferred to do this through a /etc/network/if-up.d script, but is too many steps removed from what we can do inside the boot partition.

@pipex
Copy link
Contributor

pipex commented Sep 6, 2024

Hi @shaunco, thanks again for your feedback and the PR

We understand your frustration in the inability to get access to files in the boot partition for some configurations. While we strive to continuously improve our product with features that provide value to users, we also are very mindful about the balance between utility, compatibility across device types and OS versions and risk.

Writing to the boot partition is risky, it is a FAT filesystem, which is not journaled, meaning changes to files in that partition, if they are interrupted by a power cut, could lead to corruption and potentially bricking devices. Because of this, we have invested good effort in creating mechanisms to write safely to that partition (see our [fatrw project](https://github.com/balena-os/fatrw)), but even with that, we sometimes still see issues with corruption. This is the reason we have chosen not to provide services with an easy way to bind mount to /mnt/boot.

It is in our plans to provide some config variables to modify some of the settings that require modifications to files in the boot partition, but we cannot unfortunately provide and ETA on any of those features as they have yet to be prioritized.

However, in general, configurations in the boot partition are meant to be somewhat static (related to the risk of dynamically changing files in that partition), and there are good alternatives for dynamically changing most of the configurations you mention via containers.

Redsocks

While we are aware there are multiple settings we don't surface in the host-config feature of the supervisor, this is because this interface is meant as a generic, implementation independent mechanism of configuring OS proxy settings. Designing it this way would allow us to change the proxy service if we find a better alternative, without impacting user applications.

For users that need more flexibility in their configuration, there is also the option of running a proxy service within a container and using [network_mode: service](https://docs.docker.com/reference/compose-file/services/#network_mode), for other containers to ensure they communicate via the proxy. Some of our customers chose this option for proxying connections or making devices available on a VPN.

Network Manager
Network configurations in /mnt/boot/system-connections are meant for devices that are in static networking environments. This is the reason the [recommended way of changing network configurations at runtime](https://docs.balena.io/reference/OS/network/#changing-the-network-at-runtime) is via the NetworkManager D-Bus API as it also provides a very flexible mechanism to also link configurations to service environment variables.

As you pointed out, static configurations will overwrite dynamic configurations, but this will only happen if the static versions are available, and only until the service container can start and re-configure the network.

UDEV rules
UDEV rules can also be configured within a privileged container to [work with dynamically plugged devices](https://docs.balena.io/reference/base-images/balena-base-images/#working-with-dynamically-plugged-devices), we also provide some ready to use scripts within our base images to make it easier to setup these rules.

Regarding your specific use case for DNS configuration, it might be worth opening up a support ticket for that. The OS includes its own DNS server and will default to communicating with 8.8.8.8 if the local network does not provide a DNS server. Reaching us on support would be useful for us to understand why this feature does not fit your use case.

Hope this helps, again, we appreciate your feedback, and please do reach us on support if there are any use cases you think cannot be covered via these interfaces.

@shaunco
Copy link
Author

shaunco commented Sep 6, 2024

Feel free to refer back to the tickets I opened in 2022 and 2023 that have detailed explanations of the use cases. Happy to forward them again if you want to email me.

For DNS, it is not a matter of using 8.8.8.8 - that is blocked on 100% of the customer deployments we go into (as is any public DNS service). No enterprise in their right mind would ever allow access to public DNS. The two setups we see are either a local DNS server configured via DHCP/static network config or quite literally NO DNS. That is, there is no local DNS server and all port 53 TCP/UDP communications are blocked. There is NO DNS server, so our only option is to configure the balenaOS hosts file to have the entries needed for balena-cloud (or open-balena) and other services we access.

On the Network Manager case, static IP settings are quite prevalent in OT environments - we rarely find DHCP servers. So, we statically configure entries in the boot partition's system-connections folder. That works great for some amount of time, but now the end customer wants to change our device's IP. The device has internal MMC (not an SD card). What are the options here? Either we send a person to their site somewhere in the world to manually reflash the device or we create a new disk image with the updated config that the customer can write to a thumb drive with Etcher and reflash on the device. Both are very costly and terrible experiences, and both destroy any container volumes that existed on the device prior to the reflash. To solve that, we create a container that has an easy to access (via SSH) configuration tool that allows for easy static address changes ... but it can only (officially) use the D-Bus API to change volatile settings, which get overwritten by old boot partition settings at each boot. That also doesn't work.

The same issues apply when the customer needs to change proxy settings, MITM certs, and a few other things...

Obviously, we found a way to do this using the mount-partitions.sh script, and I get that FAT volumes are error prone, but it would be great to have an official way to modify the files in here - even if it was a supervisor API to get/put files from the boot volume directly. For example:

  • GET /v2/bootfile?name=/system-connections/eth0
  • PUT /v2/bootfile?name=/system-connections/eth0
  • GET /v2/bootfilelist?path=/system-connections

@pipex
Copy link
Contributor

pipex commented Sep 9, 2024

The DNS setting can also be pre-configured via config.json although, to your point, it cannot be modified dynamically.

So, my understanding of your requirement is that you need a way to set some factory network settings, but be able to override them dynamically if the need arises. Is that correct?

This is good insight and I agree we are still lacking some features in that area, we don't yet have a way to preload environment variables, and there are several settings that cannot be modified dynamically via D-Bus, config variable, or the supervisor API (MITM certificate is a good example of that).

While it's unlikely that we'll provide unlimited access to the boot partition, outlining the problem this way provides a good path forward for prioritizing the implementation of these missing features.

For now, using something like the mount-partitions script is your best way forward for accessing the boot partition. Note that this script is very complex as it needs to deal with multiple types of boot partitions, on different device types, including systems with secure boot. If your fleet is generally uniform, you might be able to simplify to something like

set -ex

device="$(blkid | grep resin-boot | awk -F':' '{print $1}')"

function cleanup() {
   (sync && umount "${1}") || true
}

trap 'cleanup ${device}' SIGINT SIGTERM USR1 USR2 EXIT

function mount_boot() {
    local tmpmnt
    tmpmnt="$(mktemp -d)"
    mount "${1}" "${tmpmnt}"
    echo "${tmpmnt}"
}

For modifying files within that partition programmatically, we we would also recommend taking a look at https://github.com/balena-os/fatrw

@shaunco
Copy link
Author

shaunco commented Sep 9, 2024

This isn't just about dynamic modification...

I feel like I must be doing a horrible job of explaining this, so I'll try again. Our balenaOS based devices are deployed to environments that are not under our control. Sometimes those environments use DHCP, but most of the time they do not. Sometimes they have DNS servers, but often they do not. They almost always (95%+) block all external DNS. Sometimes they have proxies, sometimes they have MITM proxies.

No matter what they have on the day we deploy our box, the environment might change 3 months later or 6 months later or even 5 years later because some IT or infosec person or a networking consultant convinced that enterprise they need to redo EVERYTHING. The real world is never truly static. When these changes happen, the IT teams running these environments where our balenaOS appliance is deployed need to be able to log in to something and make changes to the network settings on it. They're happy to do the work if there is a way, but if there is not a way, they are just going to unplug our device and make it our problem... and that means we either need to fly someone across the globe to go reflash the box (losing container volumes) or send seemingly insane directions to some IT person at the place where the box is deployed telling them to use a USB thumbdrive and Etcher to download a hand-crafted image for their env (because it contains a new config.json and system-connections files) and then to go plug it into our device, wait for the power to go out, and then remove it and power the device back on (also losing container volumes). InfoSec on the customer side rightly loses their minds over this, because it is nuts, and then we spend 3 months going back and forth with the customer about "Why isn't there a web portal or ssh server I can log into to change these settings??? Every other appliance on my network has that!" and then eventually they get angry and kick us off their network and we lose a customer.

What we need is a web portal, TTY console, or SSH server built into Balena that allows a local network administrator to reconfigure ALL network settings, but the balena support team told me that would NEVER exist, so we wrote our own... but it currently needs to use some magic script to mount the boot partition and make these changes.

Thank you for the minimized script!

While it's unlikely that we'll provide unlimited access to the boot partition

I'm still unclear on this. We have unlimited access while constructing the image and flashing it. We have unlimited access via in the balena-cloud shell... but we don't get programmatic unlimited access because we "might knock the box offline". But here's the thing - in the case I described above, the box is ALREADY offline. That programmatic access is used to save us an 18 hour flight to go manually fix it... and if it doesn't work, ok, then we're back to the 18 hour flight option. Also, it seems much more likely that we'd manually break something via balena-cloud shell than we would programmatically through tested code and a UI that controls what can be changed.

@pipex
Copy link
Contributor

pipex commented Sep 9, 2024

Thanks for the additional information, your clarification aligns with my understanding so It's probably me that is not communicating properly.

To specify my understanding

  1. As a user you want to be able to provision devices with some factory settings to ensure those devices work in the deployment site from first boot.
  2. As a user you also want to provide a way for your customers to modify these settings (ideally self-serve), should the need arise.

Currently our product allows you to set factory settings (1), in config.json, system-connections, etc. It also allows you to programmatically change some (not all, I'm aware) of these settings via containers (2). The problem is that these configuration mechanisms are not very compatible. Factory settings always take precedence so the device will temporarily revert to the factory configuration until the container can run. If the engine has an issue, the device gets locked from the network.

So the way I understand it, is that we are missing some unified OS configuration interface that meets the two requirements listed above and persists changes across reboots. This is compatible with what you say here

What we need is a web portal, TTY console, or SSH server built into Balena that allows a local network administrator to reconfigure ALL network settings, but the balena support team told me that would NEVER exist, so we wrote our own... but it currently needs to use some magic script to mount the boot partition and make these changes.

However, the fact that balenaOS currently uses NetworkManager, redsocks, etc., doesn't mean that will always be the case, so the interface needs to be implementation independent so Balena users can safely update the OS without breaking their apps.

This also relates to providing access to the boot partition, while it is technically possible as you have seen, it's not a path we would recommend for everyone as it reduces independence between the app and the OS, and could potentially make OS updates harder which has other implications.

Let me know if you agree with my assessment above. I'll start pushing for some internal discussions on how this interface could be defined, but it might take some time.

@shaunco
Copy link
Author

shaunco commented Sep 9, 2024

Correct on 1 and 2, and that unified os-config interface would be ideal, although I get this is unlikely anytime soon.

I also understand that balenaOS can move away from NetworkManager, redsocks, etc any time - but for our devices, we are in control of when balenaOS updates and in control of when our config container updates, so making the configuration partition easily accessible (io.balena.features.sysfs) seems like the absolute easiest think Balena can do. From a forward compatibility standpoint, Balena just broke io.balena.features.kernel-modules with the v6 OS update, and it seems to have gone just fine, as every customer can choose when they upgrade to v6. The sysfs, procfs, kernel-modules, and firmware container labels seem much riskier than a bootfs label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants