-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
live-lock on 4-core SMP platform during the tests/lib/spsc_pbuf/ #62043
Comments
Why can't you just put that delay in arch_spin_relax(), which is there for exactly this purpose? |
Also, is there an understanding of what lock is being contended so as to produce the failure? A single core winning every time is not normally the same thing as "live lock" (in fact sometimes that's desirable for performance reasons due to cache residency concerns!). Are we 100% sure this isn't a bug in the test or spsc_pbuf? Is something expecting (or promising) some kind of "fairness" that spinlocks aren't ever really going to provide? Maybe there's a different IPC abstraction that would work better? |
Hi @andyross , currently arch_spin_relax is used for relaxing the CPUs which are spinning, and not the CPU which just released the spinlock. So it will not help out-of-the-box. I made some experiments and in order to workaround this issue added the additional delay to k_spin_unlock: 15 divisions on zero was enough for me to resolve the live-lock:
So, you are right, this may work. But it doesn't look efficient to always relax this way. |
Wait.... You need backoff on the locking thread (edit: sorry, should have said "winner thread")? Now you have me really confused. I don't see how you get that without code doing the equivalent of a synchronous "unlock();lock()" (e.g. to "let the other threads run" or whatever). Any non-trivial amount of code (I mean, seriously, like a dozen instructions even) between the two should be letting another CPU in to get the lock. And that's 100% an abuse. Spinlocks aren't semaphores, you don't use them for telling someone else something, they only protect critical state. Let's find this thing. I can almost guarantee that there's a root cause to this bug that doesn't involve the spinlock internals. |
Most of the times (about 90% of time) the sched_spinlock was the source of a live-lock, but sometimes it was a spinlock in timing subsystem. sched_spinlock is one hot synchronization element in the kernel. I believe the behavior of how multiple CPUs may contend for such "hot" spinlocks is HW-specific (CPU pipelines, caches, memory latency, NUMA, etc). |
Do you have backtraces? You can't get a live lock with just the lock itself, you need some code in a loop spinning on it. Neither the scheduler nor the timer behave like that: they get into the critical section, do their work, and get back out, with presumptively-non-trivial app code interspersed. More to the point: neither of these things should be caring about fairness at all. We need to find that loop, because without it this is all voodoo. Among other things: are we sure this isn't a memory ordering bug somewhere else causing a broken or double-locked lock? Did you try stuffing extra barriers into the spinlock paths instead? |
Also: what are the platforms that show the issue, so I can try to reproduce? FWIW: the behavior of spinlocks is essentially identical in qemu and other emulators, so this should show up there too. It's possible to hack at the qemu_x86_64 configuration to enable more cores, though I haven't done it in a long time. |
Just to clarify from above: point me to the place(s) in code where you can say "CPU A just released this lock That documents the live lock. But... again I can't find any spots like that in the scheduler or timeout subsystems. It's just not how they behave, they don't have loops with unbounded exit conditions that locking might extend, everything that happens is done on finite stuff. |
The backtraces are looking more or less the same as the one presented in the bug description. The test just stops during the test_stress_0cpy and when I look at the debugger (after awhile, you may take a few minutes nap or check immediately) I always see the same picture: one of the CPUs is owning the sched_spinlock, while the others are spinning on it. I will try to get access to the system and reconstruct the conditions when the issue had been reproducing (we live with Ticket Spinlocks internally for a few months already) and will provide you with call stacks showing the live-lock on sched_spinlock. The platform I used was a 4-core RISC-V SMP system with multi-level caches, kind of bigLITTLE (2 big cores, 2 smaller ones), it's not yet upstreamed, but we are working on it. And the tests/lib/spsc_pbuf/ is the single test from the entire tests/ folder which shows such behavior. There is nothing special in it, it is focused on a pipe-like IPC, but it applies a "right" kind of pressure to the system. I'll prepare a spinlock test to verify the fairness of spinlock acquisition. Let's see how it will work with different spinlock implementations and make conclusions. Again, I believe there is no bug in the kernel, this kind of behavior is just expected in some conditions. The most representative example is NUMA. When some CPU just has much faster/shorter access to the memory and will always win the contention. A lot of this behavior depends on caches and branch prediction, we just can't expect such effects to be applicable exclusively to NUMA systems. |
So... just to be nitpicky: that's a sufficient description of a deadlock, not a live lock. And indeed, changing the order of operations (e.g. with a fairness patch to spinlocks) can be sufficient to resolve either without actually fixing the root cause.
Repeating: that's... not really correct. You can only get a lockup if there's a cycle between two contexts waiting on each other to progress. The kernel should 100% be tolerant of arbitrary unfairness in locking. It doesn't matter if CPU0 "always wins", because it will only ever hammer the lock for finite time before going back to app code (or the idle thread, or whatever). You can only get a lockup if CPU0's work is unbounded, and if we have that in the kernel it's a bug. FWIW: knowing that this is on new hardware (and complicated hardware), and only on that new hardware, my bet is that we have a memory ordering issue instead that's left a stuck lock from the perspective of one CPU, and that adding extra atomic operations is just pushing it out of the reorder window. Alternatively you've found an equivalent issue with e.g. the bespoke spinlock on the thread.switch_handle field, either within the kernel (see kernel/include/kswap.h) or the arch layer that writes it. Maybe sprinkle memory fences around and see if something turns up? |
The deadlock wouldn't be resolved with relaxing the last-owner CPU, right? So in my case "holding" such CPU for awhile out of spinlock helps other CPUs to detect the free spinlock until the "lucky CPU" grabs it again. I understand what you are saying, we shouldn't swipe the potential bug in the kernel under the carpet using the fair spinlocks as a WAR for this bug. For sure we should look and optimize the kernel if some flaw is really there. But at the same time this is not a justification for standing out of spinlock acquisition fairness, because, as you say, the kernel should be tolerant to how the HW behaves in relation to memory access. Let's say there is just a certain level of unfairness the HW can sustain without live-locking (NUMA) and SW can't really control the "portion" of unfairness. We either should have predictable behavior, or there will be always a probability of a live-lock. This is not an invention of mine, this is a known fundamental problem, please see https://lwn.net/Articles/267968/ Bringing in the Ticket Spinlocks doesn't mean we replace the original spinlocks with them, since it's disabled by default. We just provide a solution for users with specific kind of HW, the way they did it with Linux. The kernel works with basic spinlocks by default and we are still able to reproduce and debug the potential SW issue. |
Again, you keep saying that[1], but that's not correct. Unfairness and "live lock" are different things. Live lock requires a cycle of dependency: CPU A can't finish it's work until CPU B does something, but CPU B doesn't get to run because of unfairness. And I'm saying that almost always (and in the kernel, always always!), that kind of dependency in spinning code represents a bug that needs to be fixed[2], you don't magically make it better with fairness. [1] And also citing that LWN article, which I promise you I've read. Pretty sure I read it 15 years ago when it was new, too. It's not about live lock conditions, it's about performance. And I 100% agree that there are regimes where spinlock fairness can impact performance. But there are regimes where fair spinlocks are a net loss too, and IMHO Zephyr sits more in the latter than the former. I'm not saying no, I'm saying not like this. (And I'm also saying that the circumstances argue strongly that this is a hardware interaction with e.g. a missing or incorrectly-applied barrier and not a logic bug.) [2] Because we're an RTOS, obviously. Literally everything the kernel does needs to represent finite/bounded work guaranteed to finish before whatever time metric the app has decided is its latency tolerance. |
This is true of Linux as well. Yet, user space apps may experience fairness
Predictability goes a long way towards that. Unfair spinlocks opens the way Quoting a Linux commit log:
Zephyr is not immune to such issues either. And when apps are made into The quoted hardware above isn't particularly embedded-like, but such issues So why such resistance towards a generic solution that has proven itself You keep mentioning performance issues. Can you substantiate that? And, if I recall correctly, predictability is more important than performance AS to the size increase... I tried the philosopher sample on qemu_riscv64_smp
Good to know. We'll simply do the same and recommend it to our customers |
This is going in circles. The putative live lock condition is unsubstantiated, period. Show me where it is and I'll fix it. Don't demand we merge an extremely impactful (and frankly quite risky) feature to fix a bug we can't actually find, and that shows up only one test on one non-public hardware platform. I'm not saying no to ticket spinlocks. I'm just saying we aren't going to replace the existing spinlocks on what amounts to a hunch. |
Sorry, realized I missed one bit I should have replied to:
No, it absolutely could be. Putting that relaxation step in doesn't change the logic of the behavior, but it changes the ordering of operations and the sizes of the various critical windows needed to exercise things like races and deadlocks. It doesn't "fix" the bug, but it might make it a million times (or more) less likely. I really think we have two separate things being conflated here:
|
Some numbers from today's
|
Hi @cfriedt , could you please clarify: is it with ticket spinlocks, or without? You've just run the fairness_test "as is" from the review branch, right? |
That is without ticket spinlocks |
Fixed with #61541 |
Describe the bug
tests/lib/spsc_pbuf/ test hangs due to live-lock on RISC-V 4-core SMP platform.
Most of the times the live-lock happens on sched_spinlock in the form when CPU0 always wins
the contention for this spinlock while the other cores continue to spin on it.
To workaround this issue the additional delay could be added to k_spin_unlock: 15 divisions on zero was enough for me to resolve the live-lock:
The other way to resolve this live-lock is to "halt" and then "resume" the execution via the OpenOCD
To Reproduce
Steps to reproduce the behavior:
Expected behavior
tests/lib/spsc_pbuf/ passes on all platforms, no live-locks happen due to spinlock acquisition unfairness
Impact
Zephyr kernel could live-lock on some real-life multi-core SMP platforms due to spinlock acquisition unfairness
Logs and console output
Environment (please complete the following information):
Additional context
Current Zephyr spinlock implementation is based on single atomic variable and doesn't guarantee locking fairness across multiple CPUs. So such live-locks are expected in certain conditions. One of the solutions is to use different spinlock implementation which guarantee locking order and lock acquisition fairness, e.g. in Linux such issues are addressed with Ticket Spinlocks: https://lwn.net/Articles/267968/
Ticket spinlocks proposal for Zephyr could be found here: #61541
The text was updated successfully, but these errors were encountered: