-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs
and chown
subprocesses calls are failing to internal error
in ocluster-worker
#248
Comments
These commands are executed once during the initialisation of obuilder. They ensure that the necessary folders are available and have the correct permissions. These errors indicate that the ZFS file system is unavailable, failed or just not ready. Obuilder then exits, and launchctl restarts it after ~15 seconds when, hopefully, the situation has improved. See this larger extract from the log. On the third restart, the ZFS modules were loaded, and the system started correctly.
|
That is helpful! So then probably a symptom and the not the cause of the problems we are seeing. Still worth cleaning this up, imo, as it will make it easier to debug in the future if we don't have to rely in interpreting apparent evidence of errors as signs of normal operation :) |
After reviewing the code structure, I see that adding proper error handling and reporting (by which I mean "internal errors" are not surfaced for routine and predictable error cases) would either be ad hoc or require a non-trivial reorganization. Given that this is only symptomatic of the problem we are seeing on the macos workers, and that I see no way that this will actually help us debug, I don't think it's something to prioritize at the moment. |
Noticed in https://github.com/tarides/infrastructure/issues/375#issuecomment-2329726772
I suspect this may be causing, contributing to, or masking problems that lead the macos builders to stop being able to build jobs. But proper error handling for these sub process calls should be put in place regardless.
The text was updated successfully, but these errors were encountered: