-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed dkms status or autoinstall returns code 0 instead of an error one #352
Comments
Hello fellow Arch user. Can you share some idiot proof step-by-step reproducer steps? Yes, I don't think we fixed anything like that with 3.0.12. |
I can't reproduce it with a fake module(same error, but return code 4), so I presume a condition is that a module already has to be installed, or some other weird stuff is going on. I can reproduce it by breaking an existing nvidia module by pointing its source file to /dev/null [0] % cd /var/lib/dkms/nvidia/535.113.01
[0] % sudo rm -f source; sudo ln -sf /usr/src/nvidia-535.113.01 source
[0] % dkms status
nvidia/535.113.01, 6.1.58-1-lts, x86_64: installed
nvidia/535.113.01, 6.5.7-arch1-1, x86_64: installed
[0] % sudo rm -f source; sudo ln -sf /dev/null source
[0] % dkms status
Error! Could not locate dkms.conf file.
File: /var/lib/dkms/nvidia/535.113.01/source/dkms.conf does not exist.
[0] % |
The reproducer works for me. The error seems to be coming from the All the other instances across the codebase are @anbe42 IIRC you recently silenced Do you have foresee any issues if we promote the error to being fatal? |
@scaronni if you have any input, that would be highly appreciated as well. Thanks o/ |
Thinking about this a little more: autoinstall, explicitly aims to solder on, even when building/installing of specific module fails. So promoting the error to fatal does in the opposite direction. On the other hand if dkms.conf is missing then the module is catastrophically broken. @C0rn3j what did you do/what triggered the error on your end - was it manually tinkering around or something OS/packaging that caused it? |
I am not sure yet what triggered it, I just had a bunch of broken dkms builds on two machines for non-existent kernel and driver versions, I suspect some weird race condition prodded on by the |
Looks like we have two things to fix here:
A possibility how this broken state could have happened: Some packaging removed /usr/src/$driver-$oldversion upon some upgrade without calling the corresponding dkms remove hook first ... Should not happen with Debian packaged *-dkms modules, but I don't know what else is out there in the wild ... |
Indeed splitting this in two makes sense. Recovery would be great, although since the base information is missing aka dkms.conf I don't know what we can do here. Looking from the latter point, we already exit in all the other instances of missing dkms.conf. So it's a case of making those non-fatal and then fixing the almost impossible to test error paths or flipping the final one. Browsing across the Arch packages:
AFAICT The pacman hook triggering the script is post transaction for install, and pre transaction for update/remove, so it cannot be the one causing the issue. Considering there is no obvious way how this can happen (in Arch and Debian), outside of user error (it's fine, I'm not trying to blame anyone here) I'm inclined make it fatal error. If it turns out there's some valid use-case we can quickly revert it. That said, let's leave this issue open for a while and see how things go. |
# 3.0.12
[0] % sudo dkms status
Error! Could not locate dkms.conf file.
File: /var/lib/dkms/nvidia/550.54.14/source/dkms.conf does not exist. # 3.0.13
[0] % sudo dkms status
nvidia/550.54.14: broken
Error! nvidia/550.54.14: Missing the module source directory or the symbolic link pointing to it.
Manual intervention is required!
nvidia/550.67, 6.6.23-1-lts, x86_64: installed
nvidia/550.67, 6.7.9-arch1-1, x86_64: installed (original_module exists) (WARNING! Missing some built modules!) (WARNING! Missing some built modules!) (WARNING! Missing some built modules!) (WARNING! Missing some built modules!) (WARNING! Missing some built modules!)
nvidia/550.67, 6.8.2-arch2-1, x86_64: installed Now with the new release, status goes through everything instead of instantly crashing, which will hopefully make this a bit nicer to debug... Still haven't found how why this happens, but it does keep happening. |
Both of these commands return code 0, they should return a non-zero return code, as they have errored.
This is on Arch Linux with dkms 3.0.11.
I skimmed changelog for the latest 3.0.12 which I did not test with, but it does not look like this issue was fixed there.
The text was updated successfully, but these errors were encountered: