-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
False positive "seccomp filter pointer corruption" on Linux 6.11.0-1-default x86_64 Opensuse tumbleweed #354
Comments
Thank you for reporting this @Laitinlok and sorry LKRG isn't working well for you. Please provide more detail - what architecture, what distro, what kernel build (e.g. specific distro package or whether it's your own build), kernel config. Please try loading LKRG with kINT enforcement disabled, e.g with If the problem somehow only shows up when you use the systemd service and enable the service to start at boot, then you can similarly debug this by adding Alternatively, you can try putting |
Thank you for the swift reply, it will try it and report back. |
Opensuse tumbleweed with kernel-default from zypper. |
We do test on OpenSUSE Tumbleweed here in GitHub Actions, and that test passes. But maybe there's something different in your setup, or maybe it takes longer for the issue to show up. Are you still planning to provide the additional detail I asked for above? Thank you! |
Oh, I see the last time it ran (Sep 24) it used 6.10.11-1. Maybe they've updated to 6.11 since. We'll need to re-run the test. |
I'm sorry I totally forgot for a moment that it's a build-only test, so it's not supposed to detect this issue. (We do also test boot-up with some other distros.) So still need more info on this one from you, @Laitinlok. |
Yes it can build properly on 6.10.11 with the release tarball, for 6.11 you need to use latest git commit. I have tried lkrg.kint_enforce=1, it does not help. |
Yes, that's as expected.
How exactly did you try it and how exactly does it not help? Does the kernel still panic? Are you able to capture the relevant kernel messages (as appear in Also, please share the output of |
Through sysctl. I also isolate the issue is related to lkrg.pint_enforce=2 . |
Linux 6.11.0-1-default x86_64 |
Where does Can you please run your system for a while with |
I have set it to 2 through sysctl |
10月 06 04:31:20 localhost.localdomain kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 4778, name tracker-extract |
Thank you, this helps. Do I understand correctly that you were previously using "6.10.11 with the release tarball" and it didn't exhibit the issue? |
It started having issues in 6.10.7 I think. |
That's puzzling. When issues started, did you upgrade only the kernel or also LKRG? Were those the same issues (the |
@Adam-pi3 It sounds like your reasoning in #346 could have been flawed. As seen from code snippets in #338, what changed with 38b3b11 for 5.9+ is that previously we increased refcount for Anyway, I am really tempted to do what I had suggested earlier - exclude seccomp checks on 5.9+. I think they're also incomplete anyway, checking only the first out of possible multiple filters. Is this OK with you? We haven't seen real-world exploits that would modify only seccomp and not anything else we track, have we? However, we have seen plenty of issues related to LKRG's seccomp tracking support on 5.9+, where we had to use risky hacks to get around Linux's symbol non-export. So I feel this feature has poor balance of benefit vs. risk as currently implemented, and we do not readily have an obviously better idea. |
When I reboot, I see
I can't even find a binary named gmain:
This is the default config on arch:
Please let me know if I should open a separate issue instead. |
Thank you for reporting this @Strykar! Looks like the same issue to me, so let's keep the info in here. @Adam-pi3 I think we need to look for possible seccomp-related changes between 6.10 and 6.11 to see if we possibly miss tracking some new legitimate seccomp filter pointer updates. This issue appears too frequently for it to be likely a race condition. |
I searched commit messages for mentions of seccomp. Didn't find any new legitimate updates, but found this:
Maybe this created or exposed (made more likely) a race condition? |
Yes I also experienced the same problem in the logs every time with different binaries, seems to be a false positive. |
Edit by @solardiz: dropped the over-quoting
https://github.com/openSUSE/kernel/tree/v6.11.2 |
I got alerts about "seccomp filter pointer corruption" recently too. |
Thank you @Kirkezz. The mainline commit I found above is also included in 6.10.10+, so your report does not exclude the potential that the issue is related to that commit. https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.10.10 |
Yes, you are most likely right. I found earlier logs in journalctl with this problem, and the linux version in those logs is 6.10.10. |
I installed OpenSUSE Tumbleweed Desktop and server version as my VmWare VMs and none of them has the issue which you are describing:
I assume there is some more to this problem than just kernel version. Did you compile it by yourself? Did you need to do anything specific to see the issue? I browse the internet through Firefox on Desktop VM and I didn't see any problem with LKRG:
|
Edit by @solardiz: dropped the over-quoting
Are you using dkms? |
No, I didn't use dkms because i fetch the git repo, compile it and loaded LKRG after the system was booted. |
@Laitinlok Do you happened to know how I could repro the issue? Do you execute any specific action to cause the issue? |
sudo systemctl enable --now lkrg, restart 2 times. |
Do you have secure boot and trusted boot enabled. |
Certainly it doesn't repro on my side. @Laitinlok can you try LKRG under newest SUSE kernel
I do not (it's under VM emulating BIOS) |
It has the same issues with the latest kernel. |
@Adam-pi3 What would your next steps be if you were able to reproduce the issue? Maybe we can jump to those right away. |
@Laitinlok can you change the log.level to level 4 ( I would like to see the actual value of the pointers ). You can do it via cli:
|
Sure |
I installed lkrg-dkms-git from AUR (replacing lkrg-dkms with it). The problem still persists in the logs (got one entry this boot: "BLOCK: Task: Killing pid 1424, name HTML5 Parser"), but the previous boot has a reappeared problem I'd almost forgotten about when I occasionally boot up, and there's a “Temporary failure in name resolution” and
I don't know if this is related to LKRG or that I recently updated all packages in my system to not have a partial upgrade. |
Edit by @solardiz: Added triple-backtick quoting.
|
Thanks @Laitinlok however it looks like that log_level is not at minimum WATCH (number 4) level. Can you repeat it with log_level=4 ? @Kirkezz I have no idea what lkrg-dkms-git works. Can you please get it directly from the github via:
The problems which you see are not related |
@Laitinlok @Kirkezz @Strykar Can you please try the below patch and let us know if it helps? - +++ b/src/modules/exploit_detection/p_exploit_detection.c
@@ -1414,7 +1414,8 @@ static int p_cmp_tasks(struct p_ed_process *p_orig, struct task_struct *p_curren
p_ret++;
}
- P_CMP_PTR(p_orig->p_ed_task.p_sec.sec.filter, p_current->seccomp.filter, "seccomp filter")
+ if (!(p_current->flags & PF_EXITING))
+ P_CMP_PTR(p_orig->p_ed_task.p_sec.sec.filter, p_current->seccomp.filter, "seccomp filter")
p_lkrg_seccomp_filter_put(p_current);
} |
@Adam-pi3 I overcame the laziness and looked the kernel code in proper context. The issue may actually be quite simple. This commit I found moves the call to @@ -832,6 +831,8 @@ void __noreturn do_exit(long code)
io_uring_files_cancel();
exit_signals(tsk); /* sets PF_EXITING */
+ seccomp_filter_release(tsk); and here's what /**
* seccomp_filter_release - Detach the task from its filter tree,
* drop its reference count, and notify
* about unused filters
*
* @tsk: task the filter should be released from.
*
* This function should only be called when the task is exiting as
* it detaches it from its filter tree. PF_EXITING has to be set
* for the task.
*/
void seccomp_filter_release(struct task_struct *tsk)
{
struct seccomp_filter *orig;
if (WARN_ON((tsk->flags & PF_EXITING) == 0))
return;
spin_lock_irq(&tsk->sighand->siglock);
orig = tsk->seccomp.filter;
/* Detach task from its filter tree. */
tsk->seccomp.filter = NULL;
spin_unlock_irq(&tsk->sighand->siglock);
__seccomp_filter_release(orig);
} Our incremented refcount probably prevents freeing of the filter in the trailing The trial patch I posted above should prevent the issue when we're validating the current task. This is the only case now possible with your added check of In other words, while I suggested a simpler patch for testing here, I actually propose its more elaborate revision (that would unfortunately make paranoid mode less effective on pre-5.9 kernels, albeit not to the extent we already accepted on 5.9+). +++ b/src/modules/exploit_detection/p_exploit_detection.c
@@ -1414,7 +1414,8 @@ static int p_cmp_tasks(struct p_ed_process *p_orig, struct task_struct *p_curren
p_ret++;
}
- P_CMP_PTR(p_orig->p_ed_task.p_sec.sec.filter, p_current->seccomp.filter, "seccomp filter")
+ if (current == p_current && !(p_current->flags & PF_EXITING))
+ P_CMP_PTR(p_orig->p_ed_task.p_sec.sec.filter, p_current->seccomp.filter, "seccomp filter")
p_lkrg_seccomp_filter_put(p_current);
} |
The issue may also have been triggered on older kernels, but with negligible probability. Fixes lkrg-org#354
I've just tested this for lack of regressions via our GitHub Actions in my fork of the repo, except for 3 unrelated test failures in our cross-builds (opened new issue for those). |
@Adam-pi3 I looked at the code some more and thought of these problems some more. I don't get why we were doing the get/put filter thing at all. We are only validating the pointer, not filter content, right? Well, get/put does not protect the pointer anyway - it only ensures the actual filter content won't be gone, not that the filter wouldn't get detached from the task. We do validate a few other per-task seccomp things, but they are not part of the filter, right? If the above is correct, then how about the below changes? -
Since we need to make a release soon, maybe let's use my proposed patch above for now, but try 1 and 2 in our development tree afterwards. My guess is you were planning to add validation of the filter itself, which is why you added these get/put - years ago, but we never proceeded to add such validation. If so, we'd need to revisit/re-add these if and when we're ready to add filter validation. Perhaps along with also recognizing and validating potential multiple filters per task. For now, though, we have incomplete functionality that is better dropped. |
@Adam-pi3 Further, Our usage looks different: we're doing the get/put for a moment either to dump or validate the filter (if we were to add its real validation, beyond pointer). If the filter can't be concurrently freed at the time of So what we currently do looks like nonsense to me now, not only for the functionality that we have, but also for further extension. |
I've just implemented my proposed simplification in my fork of the repo. @Laitinlok @Kirkezz @Strykar Can you please test https://github.com/solardiz/lkrg as of commit 3bdf5c8 and let us know how it works for you?
|
@solardiz I think it boils down to the discussion which we had here: I still think we may want to keep references. And yes, we wanted to add filter validations itself. Btw. Let's wait for @Laitinlok and others if your patches fixes the issue |
Yes, but in that discussion neither of us appeared to realize we're not actually accessing the filter.
Why would we? They're references on the filter, which we never access. They do not affect the pointer, which we do access. And we acquire them in a way that would either be unneeded/redundant or unreliable if/when we add filter validation. I'd say that was useless and misleading code that we had, which also caused us portability problems for no reason. Let's drop it for good.
Yes. I hope we'll hear from them soon. |
Everything seems to be working fine. I am no longer getting any warnings in the logs. Installed lkrg from your commit and rebooted twice. |
Yes it is working fine with this commit. |
Thank you for testing the fix @Kirkezz and @Laitinlok! |
I just built and loaded it, and opened a few programs, no issues so far. |
@solardiz Just FYI: Here is what LKRG looks like running for a day on a daily driver dekstop post this patch:
|
@Adam-pi3 We got the |
Random crashes on boot and constant kernel panic at runtime and unable to reboot system properly when loaded with lkrg on systemd . System shows error with lkrg during reboot when loaded in runtime.
The text was updated successfully, but these errors were encountered: