kernel: spinlock: Ticket spinlocks #61541

amr-sc · 2023-08-16T09:56:39Z

Basic spinlock implementation is based on single
atomic variable and doesn't guarantee locking fairness across multiple CPUs. It's even possible that single CPU will win the contention every time which will result in a live-lock.

Ticket spinlocks provide a FIFO order of lock aquisition which resolves such unfairness issue at the cost of slightly increased memory footprint.

peter-mitsis · 2023-08-16T16:11:51Z

Very clever. Am I correct in understanding that in the ticket spinlock model, the locked field serves no functional purpose as the determination of it being locked/unlocked is solely dependent upon the owner and tail fields?

amr-sc · 2023-08-16T17:40:20Z

Very clever. Am I correct in understanding that in the ticket spinlock model, the locked field serves no functional purpose as the determination of it being locked/unlocked is solely dependent upon the owner and tail fields?

Hi @peter-mitsis ,

Indeed. Currently "locked" field is some kind of a compromise to pass the tests: in tests/kernel/spinlock we address this field directly (which is a bad idea, imho) to verify the spinlock state.

P.S.: We probably shouldn't access the spinlock internals directly in tests/kernel/spinlock and introduce k_spin_is_locked() API. But I don't have a firm opinion here since such API would only be used for the tests. So I just left it this way

amr-sc · 2023-08-17T10:45:34Z

Hi All,

Currently we have a test failure due to the following spinlock API extension in the module, which is external to Zephyr (modules/audio/sof/zephyr/include/rtos/spinlock.h):

/* not implemented on Zephyr, but used within SOF */
static inline void k_spinlock_init(struct k_spinlock *lock)
{
#ifdef CONFIG_SMP
	atomic_set(&lock->locked, 0);
#endif
}

What's our policy about it? Should we bring this API to Zephyr, or should we extend it in SOF?
@andyross , @galak , @nashif , @peter-mitsis what do you guys think?

andyross

First off, please check the correctness. This doesn't look right to me, but lockless algorithms are REALLY hard, so it's possible this will just need a round or seven of explanation.

Designwise: No complaints about having this in the stable of synchronization primitives, but absolutely not as the default for everyone. A spinlock is the minimum-overhead synchronization choice for routine mostly-uncontended critical sections. The kernel users don't want fairness, they want performance about 99% of the time.

Can you refactor as a k_fair_spinlock or whatnot?

andyross · 2023-08-18T16:43:12Z

include/zephyr/spinlock.h

+	atomic_t next_ticket;
+	atomic_val_t next_ticket_val;
+
+	atomic_set(&next_ticket, ticket_val);


This looks really suspicious to me. You're getting the value of "owner" and then writing it unconditionally to "next_ticket" with no synchronization. The following atomic_inc() is clearly not atomic, because the write is unconditional.

That is: what happens if you have two threads in exactly this code racing against each other? You can draw paths through the execution where next_ticket is incremented either once or twice.

Hi @andyross,

Here we just estimate how the l->owner and l->tail of the free spinlock would look like for the paricular l->owner. We don't really change the spinlock state until the CAS, we keep the estimation values on stack.
The atomic operations before the atomic_cas are done only for the sake of the values consistency. We can't just increment ticket_val since it's not how we change real l->owner and l->tail in k_spin_lock and k_spin_unlock:
the estimation should follow the same steps as the real algorithm where we use atomic operations on these fields.

The contending threads indeed may have the same ticket_val and next_ticket_val, but only the fastest thread will acquire the spinlock with

if (!atomic_cas(&l->tail, ticket_val, next_ticket_val)) { goto busy; }

The second thread will not met the CAS condition and will return -EBUSY.

Moreover, we are not entirely fair here since in case the spinlock will be acquired AFTER we "freeze" the ticket_val with atomic_get(&l->owner) and released BEFORE we come to atomic_cas we still will identify the spinlock as locked. So sometimes k_spin_trylock will fail with ticket spinlocks in condition when original spinlock will succeed. This is a side effect of having 2 atomic variables instead of 1 for ticket spinlocks.

andyross · 2023-08-18T16:46:35Z

FWIW: that SOF thing looks like a mistake. Spinlocks don't have an initialization API, they're intended to be zero-filled as initialization so they can be used out of .bss before initialization hooks can be called etc... But if SOF really wants that we can provide a K_SPINLOCK_INIT token or something that just expands to an zero-filled struct.

amr-sc · 2023-08-21T06:36:09Z

First off, please check the correctness. This doesn't look right to me, but lockless algorithms are REALLY hard, so it's possible this will just need a round or seven of explanation.

Designwise: No complaints about having this in the stable of synchronization primitives, but absolutely not as the default for everyone. A spinlock is the minimum-overhead synchronization choice for routine mostly-uncontended critical sections. The kernel users don't want fairness, they want performance about 99% of the time.

Can you refactor as a k_fair_spinlock or whatnot?

Hi @andyross,

Disable ticket spinlocks by default: sure, that's clear, let's disable them in the final version. I suggest however that we make the CI work so we ensure that all the tests pass with this feature enabled and then switch it off (and ensure that everything is still fine) before the merge. What do you say?
k_fair_spinlock - please clarify what do you mean? Do you suggest to have a different public API for these spinlocks? This would mean that we wouldn't be able to seamlessly switch the kernel between "fairness"-vs-"footprint/performance". And the main motivation behind this change was that I've experienced a live-lock in the kernel on 4-core SMP platform due to original spinlocks unfairness.

P.S.: this concept is not my invention, Linux has passed this path long time ago: https://lwn.net/Articles/267968/

amr-sc · 2023-08-21T06:47:33Z

FWIW: that SOF thing looks like a mistake. Spinlocks don't have an initialization API, they're intended to be zero-filled as initialization so they can be used out of .bss before initialization hooks can be called etc... But if SOF really wants that we can provide a K_SPINLOCK_INIT token or something that just expands to an zero-filled struct.

Thanks! Then let's make a relative change to SOF. I will post a link here once I open a review there

andyross · 2023-08-21T15:51:53Z

Yeah, for sure let's merge this as a separate API while it stabilizes. Adding a bunch of macros to do the switch at some point in the future is trivial, but even then I think that most apps that hit these concerns probably do want control over which locks get the treatment (because, even in Linux, the overwhelming majority of locks aren't contended).

Can you submit a bug on that live lock situation you hit? I'm betting it's going to root cause to something closer to a scheduler mistake, or be something that a tweaked arch_spin_relax() can treat more simply. And even if that's not possible, I think I'd be a lot more OK with a patch "replace the scheduler lock with a ticket spinlock" (especially if it came with a bug report and measurements!) than "all spinlocks are now ticket locks".

Basically, we don't run in contention environments anything like what Linux deals with. Zephyr platforms tend to care a lot more about code size and simplicity[1] than about this kind of optimization, and we don't have the depth of community expertise to draw on to maintain this stuff. The barrier to "let's put a novel lockless algorithm[2] right into the middle of the most important primitive in the kernel" is a ton higher for us than for Linux.

I don't have time to puzzle through the explanation about the trylock code, will get to that soon. (Though my first thought on scanning your text is "Oh, it's a guess at what we'll get" -- so maybe call the variable in question something like "guess"?).

[1] e.g. Think of all the CI overhead it would be testing both of these spinlock variants to the same level of coverage!

[2] Note that we just got support for standard memory barrier primitives about two months ago, and I'm willing to bet that the atomics layer is still going to have warts on some platforms (arm64 is the poster child for surprising out-of-order semantics, check there for sure) that would break this. I realize @carlocaione isn't on the review list and should be; barriers are his thing.

amr-sc · 2023-08-29T15:45:00Z

Hi @andyross , @carlocaione ,

What has been done to the moment:

Ticket Spinlocks are now disabled by default
separate bug to describe the motivation behind this change was opened: live-lock on 4-core SMP platform during the tests/lib/spsc_pbuf/ #62043
I improved the comments in k_spin_trylock (was fiddling with variable names, but that didn't look clear enough to me, so I sticked with comments)

@andyross , I have some doubts when I think about the separate API for this implementation. Here is what troubles me:
1.) we would need a distinct struct k_fair_spinlock, which would be mostly a duplicate of generic struct k_spinlock
2.) we would need to provide the same functionality as the generic spinlock has, including the features like CONFIG_SPIN_VALIDATE, etc. which would essentially mean a lot of duplicated code to keep in-sync
3.) the distinct API wouldn't fix the issue mentioned in #62043 , since the existing spinlocks like sched_spinlock in Zephyr kernel would not be affected

Could we have a configuration option and keep the ticket spinlocks disabled by default? This seems to be safe, if probably I'm missing something.. What do you say?

P.S.: the latest build failure seems not to be related to my changes (error in some device tree?)

devicetree error: 'config' is marked as deprecated in 'properties:' in /__w/zephyr/zephyr/dts/bindings/sensor/ti,ina230.yaml for node /soc/i2c@40005400/ina230@40.

andyross · 2023-10-31T16:10:56Z

need to fix the new test, it fails almost 99% of the time when run locally. not sure how it passed CI...

It may depend on environment. On my box, when unloaded, it's 90% reliable or so. Most likely the problem is that host scheduling is failing to run the two threads simultaneously, and obviously a contention test only works if both CPUs are actually running. Having one halt magically because of the host scheduler is going to look very unfair. This may be hard to tune (e.g.: wait for some kind of "the other side is running" detection, then run for just a short time ("significantly less than CONFIG_HZ on the host kernel"), then repeat. Something like that.

@andyross does not want to have the code disabled by default in such a key area. Instead, he suggests having a canary platform that acts as a test introductory platform first

Right. More bluntly: none of my complaints rise to a level where I'd refuse it based on the code alone[1]. Prove to me that this runs on all our SMP platforms and doesn't break anything and I'll shut up and +1.

[1] The code itself looks fine, though I do think it wants some attention if we're going to be serious about this long term: rollover protection for sure, probably also 16 bit counters a-la Linux. (Or 15 and roll the locked bit into the same word), etc...

Basic spinlock implementation is based on single atomic variable and doesn't guarantee locking fairness across multiple CPUs. It's even possible that single CPU will win the contention every time which will result in a live-lock. Ticket spinlocks provide a FIFO order of lock aquisition which resolves such unfairness issue at the cost of slightly increased memory footprint. Signed-off-by: Alexander Razinkov <[email protected]>

Added test to verify the spinlock acquisition fairness in relation to the CPUs contending for the spinlock. This test is only enabled for Ticket Spinlocks which required to provide such kind of fairness. Signed-off-by: Alexander Razinkov <[email protected]>

Added test to verify that the value of atomic_t will be the same in case of overflow if incremented in atomic and non-atomic manner Signed-off-by: Alexander Razinkov <[email protected]>

amr-sc · 2023-10-31T19:25:00Z

Hi @andyross , @cfriedt , @nashif , @evgeniy-paltsev , @npitre , @carlescufi , @carlocaione , as was discussed on our Architecture WG meeting I've enabled CONFIG_TICKET_SPINLOCKS by default for qemu_riscv64_smp platform.
Is this enough or should I enable it for some other SMP platforms as well?

I've also prolonged a busy loop imitating some work within the test to help the threads synchronization which is critical for QEMU where a contending thread could be scheduled out in any moment which results in an artificial unfairness. This is not a 100% robust way to avoid the artificial unfairness, but let's see how it goes.

P.S.: QEMU has more advanced and precise emulation mode, CONFIG_QEMU_ICOUNT, but it is disabled for qemu_riscv64_smp and when I try to enable it the platform build fails.

andyross · 2023-10-31T19:43:31Z

My memory from the last time we went through this was that icount provides deterministic emulation for only one core; in SMP the two CPUs still run in separate host threads, so it wouldn't address the problem.

povergoing · 2023-11-01T03:10:03Z

FYR, just put some test reports here in case it might help. I tested it on a simulation platform fvp_baser_aemv8r_smp (Arm64 SMP 4 cores) which is a 100% deterministic platform, and all cores are simulated in only one host thread (at least one host core).
cmd: west build -p always -b fvp_baser_aemv8r_smp -t run -T zephyr/tests/kernel/spinlock/kernel.multiprocessing.spinlock_fairness

With CONFIG_TICKET_SPINLOCKS=y:

===================================================================
START - test_spinlock_fairness
CPU0 acquired spinlock 1000 times, expected 1000
CPU1 acquired spinlock 1000 times, expected 1000
CPU2 acquired spinlock 1000 times, expected 1000
CPU3 acquired spinlock 1000 times, expected 1000
 PASS - test_spinlock_fairness in 1.604 seconds
===================================================================

For comparison, CONFIG_TICKET_SPINLOCKS=n:

===================================================================
START - test_spinlock_fairness
CPU0 acquired spinlock 248 times, expected 1000
CPU1 acquired spinlock 153 times, expected 1000
CPU2 acquired spinlock 3599 times, expected 1000
CPU3 acquired spinlock 0 times, expected 1000

    Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/spinlock/src/spinlock_fairness.c:150: spinlock_test_spinlock_fairness: (spinlock_grabbed[core_id] < FAIRNESS_TEST_CYCLES_PER_CORE is true)
CPU0 starved on a spinlock: acquired 248 times, expected 1000

 FAIL - test_spinlock_fairness in 1.610 seconds
===================================================================

However, it's worth noting that FVP simulates all cores in a single host core, which means the host core might simulate e.g. ~1000 instructions per core sequentially (for example, we have core 0 ~ core 3, the host core simulates 1000 instructions for Core0 firstly, then for core1 and ...). That's why it's so "unfair" when CONFIG_TICKET_SPINLOCKS=n as the report shows. I think this exaggerated result will not happen in real hardware because real hardware executes instructions simultaneously.

npitre · 2023-11-01T21:03:09Z

Jaxson Han wrote:

real hardware executes instructions simultaneously

While this is true, different cores can't access the same memory simultaneously. And some memory might be cached on one core while access to that memory is put behind a transaction queue on another core putting it at a disadvantage. This is even systematic on NUMA architectures. Therefore your test results are pretty representative of what might happen on real hardware.

andyross

Spent some time today with twister and the six[1] SMP emulation platforms. The upshot is that the spinlock test is much improved, but still failing significantly more than other tests[2] in a loaded CI run. Roughly half the failures in any run are this test:

A full twister suite with default kconfig saw 10 failures
Two full suite runs with tickets set to =y globally saw 19 the first time and 10 the second

Breaking out the spinlock test specifically and running it on an unloaded system, though, looks much better: one failure in 60 twister iterations (across 15 test binaries each, so 1 in 900). Given that the test is known to be able to fail stochastically in emulation, that seems actually OK to me.

Basically it looks good to me from an emulation perspective. I'm going to put a +1 on it as it's seems extremely unlikely to break anything at the integration level.

I would still love to see someone:

Get some results from a full suite on a real SMP hardware platform
Report some benchmarking numbers (not just the fairness results) on real hardware

[1] twister -p qemu_cortex_a53_smp -p qemu_riscv32_smp -p qemu_riscv64_smp -p qemu_x86_64 -p qemu_x86_64_nokpti, which as of this PR runs 1874 individual test binaries.

[2] Though an interesting second place I hadn't noticed was tests/subsys/zbus/integration. This dies once in almost every run, and twice died successively and required a third --only-failed run. Probably worth checking out, usually this means a test with needlessly tight timing constraints.

povergoing · 2023-11-02T05:39:26Z

I would still love to see someone:
Get some results from a full suite on a real SMP hardware platform
Report some benchmarking numbers (not just the fairness results) on real hardware

IIRC @cocoeoli @lenghonglin are doing some work related to real hardware SMP (Raspberry Pi 4 and ROC-RK3568-PC?), no idea if you had a chance to have a test?

lenghonglin · 2023-11-02T06:17:29Z

I would still love to see someone:
Get some results from a full suite on a real SMP hardware platform
Report some benchmarking numbers (not just the fairness results) on real hardware

IIRC @cocoeoli @lenghonglin are doing some work related to real hardware SMP (Raspberry Pi 4 and ROC-RK3568-PC?), no idea if you had a chance to have a test?

Busy those day. I wiil try to complete smp on Raspberry Pi 4 B this weekend.

povergoing · 2023-11-02T06:25:11Z

Busy those day. I wiil try to complete smp on Raspberry Pi 4 B this weekend.

Appreciate it, and really no rush, just take your time

evgeniy-paltsev · 2023-11-02T12:21:58Z

Get some results from a full suite on a real SMP hardware platform

I'll run (well actually re-run) test suit on ARC HW SMP boards (HSDK and HSDK4xD) - both have 4 CPU cores.

evgeniy-paltsev · 2023-11-03T15:19:52Z

I've played a bit with fairness test on real SMP HW.

In worst case I've got such results (with regular Zephyr spinlock implementation):

HSDK (4 cores)

START - test_spinlock_fairness
CPU0 acquired spinlock 0 times, expected 1000
CPU1 acquired spinlock 3997 times, expected 1000
CPU2 acquired spinlock 1 times, expected 1000
CPU3 acquired spinlock 2 times, expected 1000

HSDK (2 cores)

START - test_spinlock_fairness
CPU0 acquired spinlock 1998 times, expected 1000
CPU1 acquired spinlock 2 times, expected 1000

HSDK4xD (4 cores)

START - test_spinlock_fairness
CPU0 acquired spinlock 1999 times, expected 1000
CPU1 acquired spinlock 0 times, expected 1000
CPU2 acquired spinlock 1996 times, expected 1000
CPU3 acquired spinlock 5 times, expected 1000

That's expected as if core already has atomic variable in it's local cache (and in owned state) than this core will perform atomic operation faster than other cores.

The test passes on all SMP HW when I enable ticket spinlock.

I've also run full testsuit on all ARC SMP platforms with ticket spinlock enabled - and I don't see any additional failures.

kartben · 2023-11-10T17:25:16Z

kernel/Kconfig

+	  Basic spinlock implementation is based on single
+	  atomic variable and doesn't guarantee locking fairness
+	  across multiple CPUs. It's even possible that single CPU
+	  will win the contention every time which will result
+	  in a live-lock.
+	  Ticket spinlocks provide a FIFO order of lock acquisition
+	  which resolves such unfairness issue at the cost of slightly
+	  increased memory footprint.


in a future update it would be great to reword so that the help starts with what ticket spinlock is, rather than what it is not. I.e. the last sentence (with minor tweaks, proably) should be first.

amr-sc requested review from dcpleung, nashif, andyross, carlescufi, galak and MaureenHelm as code owners August 16, 2023 09:56

zephyrbot added the area: Kernel label Aug 16, 2023

zephyrbot requested review from ceolin, cfriedt and peter-mitsis August 16, 2023 09:57

zephyrbot assigned andyross Aug 16, 2023

amr-sc force-pushed the ticket-spinlocks branch 4 times, most recently from 8bdf6cb to 2b8d641 Compare August 16, 2023 15:04

amr-sc force-pushed the ticket-spinlocks branch from 2b8d641 to 0b3a00d Compare August 17, 2023 09:15

amr-sc force-pushed the ticket-spinlocks branch from 0b3a00d to ff52b27 Compare August 17, 2023 12:44

andyross requested changes Aug 18, 2023

View reviewed changes

andyross requested a review from carlocaione August 21, 2023 15:52

amr-sc force-pushed the ticket-spinlocks branch from ff52b27 to 089bf25 Compare August 29, 2023 07:35

amr-sc mentioned this pull request Aug 29, 2023

live-lock on 4-core SMP platform during the tests/lib/spsc_pbuf/ #62043

Closed

amr-sc force-pushed the ticket-spinlocks branch from 089bf25 to a8caf4a Compare August 29, 2023 14:56

amr-sc dismissed stale reviews from carlocaione, cfriedt, npitre, and evgeniy-paltsev via ccb4a12 October 31, 2023 16:47

amr-sc force-pushed the ticket-spinlocks branch from 85a1365 to ccb4a12 Compare October 31, 2023 16:47

amr-sc requested review from kgugala and pgielda as code owners October 31, 2023 16:47

zephyrbot added the area: RISCV RISCV Architecture (32-bit & 64-bit) label Oct 31, 2023

amr-sc added 3 commits October 31, 2023 21:04

tests: kernel: common: added test on atomic_t overflow

c5d6d42

Added test to verify that the value of atomic_t will be the same in case of overflow if incremented in atomic and non-atomic manner Signed-off-by: Alexander Razinkov <[email protected]>

amr-sc force-pushed the ticket-spinlocks branch from ccb4a12 to c5d6d42 Compare October 31, 2023 18:05

nashif approved these changes Nov 1, 2023

View reviewed changes

npitre approved these changes Nov 1, 2023

View reviewed changes

andyross approved these changes Nov 1, 2023

View reviewed changes

amr-sc requested a review from cfriedt November 2, 2023 07:34

evgeniy-paltsev approved these changes Nov 3, 2023

View reviewed changes

nashif merged commit 5725585 into zephyrproject-rtos:main Nov 4, 2023

amr-sc deleted the ticket-spinlocks branch November 7, 2023 06:43

kartben reviewed Nov 10, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel: spinlock: Ticket spinlocks #61541

kernel: spinlock: Ticket spinlocks #61541

amr-sc commented Aug 16, 2023

peter-mitsis commented Aug 16, 2023

amr-sc commented Aug 16, 2023 •

edited

Loading

amr-sc commented Aug 17, 2023

andyross left a comment

andyross Aug 18, 2023

amr-sc Aug 21, 2023

andyross commented Aug 18, 2023

amr-sc commented Aug 21, 2023

amr-sc commented Aug 21, 2023

andyross commented Aug 21, 2023

amr-sc commented Aug 29, 2023

andyross commented Oct 31, 2023

amr-sc commented Oct 31, 2023

andyross commented Oct 31, 2023

povergoing commented Nov 1, 2023

npitre commented Nov 1, 2023 via email

andyross left a comment •

edited

Loading

povergoing commented Nov 2, 2023

lenghonglin commented Nov 2, 2023

povergoing commented Nov 2, 2023

evgeniy-paltsev commented Nov 2, 2023

evgeniy-paltsev commented Nov 3, 2023

kartben Nov 10, 2023

kernel: spinlock: Ticket spinlocks #61541

kernel: spinlock: Ticket spinlocks #61541

Conversation

amr-sc commented Aug 16, 2023

peter-mitsis commented Aug 16, 2023

amr-sc commented Aug 16, 2023 • edited Loading

amr-sc commented Aug 17, 2023

andyross left a comment

Choose a reason for hiding this comment

andyross Aug 18, 2023

Choose a reason for hiding this comment

amr-sc Aug 21, 2023

Choose a reason for hiding this comment

andyross commented Aug 18, 2023

amr-sc commented Aug 21, 2023

amr-sc commented Aug 21, 2023

andyross commented Aug 21, 2023

amr-sc commented Aug 29, 2023

andyross commented Oct 31, 2023

amr-sc commented Oct 31, 2023

andyross commented Oct 31, 2023

povergoing commented Nov 1, 2023

npitre commented Nov 1, 2023 via email

andyross left a comment • edited Loading

Choose a reason for hiding this comment

povergoing commented Nov 2, 2023

lenghonglin commented Nov 2, 2023

povergoing commented Nov 2, 2023

evgeniy-paltsev commented Nov 2, 2023

evgeniy-paltsev commented Nov 3, 2023

kartben Nov 10, 2023

Choose a reason for hiding this comment

amr-sc commented Aug 16, 2023 •

edited

Loading

andyross left a comment •

edited

Loading