-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preemption Safety Fixes and Miscellaneous Improvements #76
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Richard Yao <[email protected]>
The changes to this file were never upstreamed and caused the compiler to complain about type issues. Any changes made here should be upstreamed. Signed-off-by: Richard Yao <[email protected]>
We define strlcat ourselves, so we should include its function prototype in strings.h Note that we need a way to handle sane libcs like uclibc that provide strlcat. Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
This does *NOT* play nicely with preemptible kernels. Since we are running as root, this doesn't matter very much right now. Lets disable it until it can be reworked. Signed-off-by: Richard Yao <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
The use of custom code makes debugging difficult and there does not appear any reason for it given that the native Linux functions work on ARM, so lets get rid of it. This resolves potential issues where Linux's memory allocation functions fail to warn that we are in an interrupt context. Memory allocations in such contexts are unsafe and do not happen on any other DTrace platform, so it is a very good thing to ensure that we are warned when this happens. Signed-off-by: Richard Yao <[email protected]>
The custom locking infrastructure in dtrace4linux was designed to operate in an interrupt context and consequently, many locks seem to have been modified to incorporate a change in locking semantics where their use creates an interrupt context on entering a lock and leaves it on exiting a lock. This has multiple problems. 1. The new locking semantics clobber the interrupt masks whenever locking fails to be done in reverse order. For instance, dtrace_module_unloaded() locks variables in dtrace4linux this way: dmutex_enter(&dtrace_provider_lock); dmutex_enter(&mod_lock); dmutex_enter(&dtrace_lock); It then unlocks them this way: dmutex_exit(&dtrace_provider_lock); dmutex_exit(&mod_lock); dmutex_exit(&dtrace_lock); Since we have multiple locks with the new semantics, dtrace_provider_lock will be the one that stores the original interrupt mask while the others store interrupt masks that have interrupts disabled. Since unlocking is not done in reverse, we exit this function with interrupts disabled and consequently, bad things can happen. This is not the only example of this problem and while the lock ordering should be fixed, it does explain some problems specific to dtrace4linux that do not occur on other dtrace ports. 2. The mutexes were replaced with semaphores, but we have no stack on which to store multiple interrupt contexts, so they get clobbered then. 3. Other DTrace platforms, such as Illumos, expect these locks to have mutex semantics that are unsafe inside interrupt contexts. There is no apparent benefit to being different, but plenty of issues stem from being different. The change in semantics risks of plenty of things that should work in routines known to work fine on other platforms can and will fail in non-obvious ways. Signed-off-by: Richard Yao <[email protected]>
Comments suggest that the rationale for using custom code for dtrace_xcall() was that it on_each_cpu did not interact well with the custom locking. Since that code needs to be replaced with more conventional Linux locks, the rationale for a custom xcall implementation is gone. Furthermore, it is not clear that this interacts well with on_each_cpu(), so lets drop it in favor of wrapping on_each_cpu(). This avoids having to worry about strange interactions. Signed-off-by: Richard Yao <[email protected]>
The locking and xcall changes make this unnecessary. It triggers a warning on preemptible kernels, so lets get rid of it: [53134.451543] BUG: using smp_processor_id() in preemptible [00000000] code: dtrace/19160 [53134.451559] caller is dtrace_ecb_destroy+0x523/0x6d0 [dtracedrv] [53134.451563] CPU: 3 PID: 19160 Comm: dtrace Tainted: P O 3.12.6 dtrace4linux#3 [53134.451564] Hardware name: Supermicro X9SRA/X9SRA-3/X9SRA/X9SRA-3, BIOS 3.0a 08/08/2013 [53134.451566] 0000000000000000 ffff880ec889bdb0 ffffffff813ce370 0000000000000003 [53134.451572] ffff880ec889bdc8 ffffffff8120ce93 ffff880ed1daff60 ffff880ec889be20 [53134.451576] ffffffffa1156b83 00000001c889bde8 ffffffffa117808d ffff880ec889bdf8 [53134.451579] Call Trace: [53134.451586] [<ffffffff813ce370>] dump_stack+0x4e/0x82 [53134.451592] [<ffffffff8120ce93>] debug_smp_processor_id+0xd3/0xf0 [53134.451601] [<ffffffffa1156b83>] dtrace_ecb_destroy+0x523/0x6d0 [dtracedrv] [53134.451616] [<ffffffffa117808d>] ? dtrace_xcall+0x2d/0x30 [dtracedrv] [53134.451626] [<ffffffffa115c8c7>] dtrace_state_destroy+0x167/0x860 [dtracedrv] [53134.451636] [<ffffffffa115e16c>] dtrace_close+0x8c/0x150 [dtracedrv] [53134.451645] [<ffffffffa11637f2>] dtracedrv_release+0x12/0x20 [dtracedrv] [53134.451652] [<ffffffff8114631a>] __fput+0xea/0x2c0 [53134.451655] [<ffffffff81146529>] ____fput+0x9/0x10 [53134.451660] [<ffffffff81063c8c>] task_work_run+0x9c/0xd0 [53134.451665] [<ffffffff81002a0a>] do_notify_resume+0x8a/0xa0 [53134.451669] [<ffffffff813dafea>] int_signal+0x12/0x17 Signed-off-by: Richard Yao <[email protected]>
It is unsafe to use mutex functions inside an interrupt context. Signed-off-by: Richard Yao <[email protected]>
This creates a flood of warnings to dmesg on systems with preemptible kernels. These warnings are not of huge importance, so lets disable them until a different mechanism is available for our use. I believe that doing this is safe in >99.9% of cases. Signed-off-by: Richard Yao <[email protected]>
Hello Richard, can you give me more detail on what you are doing here? Eg using native thanks On 10 March 2014 19:16, Richard Yao [email protected] wrote:
|
@dtrace4linux If you look carefully, you will see that dmutex causes plenty of deadlocks. The interrupt flags are being clobbered and the code is not designed to work under such semantics. As for dtrace_xcall, it was a large amount of code and I did not see much reason for it. It is not clear to me that we gain much by being able to trace those functions, but it is clear to me that reasoning about correctness in this custom code is hard. I compared dtrace4linux with other ports and I did not see these things being done. That being said, you do raise the valid point that dtrace4linux probably should be revised to stop permitting that to be traced via fbt as part of this. The same goes for revising it to stop permitting tracing of spinlocks (and possibly also mutexes) via fbt in favor of something more like lockstat. The main purpose of this pull request was feedback. I am not aware of any significant regressions in comparison to what dtrace does on Illumos, but the improvements in terms of making capabilities safe on Linux like on Illumos are rather substantial in my testing. I think there is definite merit to these changes, although they do need more review. Lastly, I bundled some less extreme fixes into this. It was not clear to me breaking this into separate pull requests had much value when I wanted review. Anyway, there are some more clear cut fixes like "Convert dtrace_stack_mutex into a spin-lock", "Add strlcat to strings.h" and the patches to disable kernel preemption in certain key regions. I made an effort to keep things broken into logically separate commits so it would not be a giant monolithic patch. Feel free to cherry-pick the safer things separately so we can whittle this down. :) |
Thanks Richard. Your comments are inciteful. Firstly, if dmutex is molesting the interrupt flags - thats very bad and dtrace_xcall is necessarily "complex" - its based on the solaris code of Yes - you can simplify the code vastly if you accept the tradeoffs, but In your case, you have no need for this - so your changes are good and work The big issue is "reachability" - I have had to manually walk the I checked the Apple/FreeBSD ports - and the FreeBSD port (at least in the I'll re-review your changes and certainly accept some of the utility On 10 March 2014 21:25, Richard Yao [email protected] wrote:
|
The included patches dramatically improve stability on my system when using the fbt provider to analyze ZFSOnLinux. So far, I have found no regressions. A few of these patches are a little rough, but I am opening this pull request for feedback.