k_pipe: add-ons to the k_pipe API rewrite #84052

npitre · 2025-01-15T19:53:20Z

This implements various cleanups and optimizations I saw the potential for
when I reviewed PR #83283. The original author did quite a lot of good work
already and I didn't want to block the merging of his own PR with these any
longer. Furthermore, it becomes easier to convey those things directly in
the code at some point.

Then it was pretty easy to add back the ability to do direct writer-to-reader
transfers and no-ring-buffer operations as the deprecated code did.

zephyrbot · 2025-01-15T19:54:23Z

The following west manifest projects have changed revision in this Pull Request:

Name	Old Revision	New Revision	Diff

✅ All manifest checks OK

Note: This message is automatically posted and updated by the Manifest GitHub Action.

Mattemagikern · 2025-01-15T20:49:15Z

kernel/pipe.c


 	if (K_TIMEOUT_EQ(timeout, K_NO_WAIT)) {
 		return -EAGAIN;
 	}

 	pipe->waiting++;
 	SYS_PORT_TRACING_OBJ_FUNC_BLOCKING(k_pipe, read, pipe, timeout);
-	z_pend_curr(&pipe->lock, *key, waitq, timeout);
+	rc = z_pend_curr(&pipe->lock, *key, waitq, timeout);


I didn't utalize this because of this discussion with @andyross:
#83247 (comment)

To clarify: I'm not a fan of promiscuous use of z_thread_return_value_set as it usually pretty confusing, wastes bytes in the thread struct, and tends to just recapitulate a bunch of state that the IPC mechanisms need to keep anyway (i.e. you generally need to have some sideband to tell you "why you woke up" and this is just a duplicate). My hope in the long term is that pend() ends up just flagging a boolean result: either you were deliberately awoken (currently a return value of zero), or you bounced for some async reason (and need to return -EAGAIN to your caller).

And in fact that's all this is doing: it just does the wakeup with a result code of zero, and checks for the -EAGAIN which is the default value. So while this is using a feature I personally dislike, it's doing it in a way that would port cleanly to the eventual utopia I imagine.

Mattemagikern · 2025-01-15T20:49:18Z

kernel/pipe.c

@@ -110,7 +100,7 @@ int z_impl_k_pipe_write(struct k_pipe *pipe, const uint8_t *data, size_t len, k_
 		}

 		if (pipe_empty(pipe)) {
-			notify_waiter(&pipe->data);
+			z_sched_wake(&pipe->data, 0, NULL);


What is the difference between the z_sched_wake & z_upend_* ?

Layering. The former is synchronized and manages the timeout value and some other bits of state you need to worry about when you wake a thread up.

andyross

This looks clean. Preemptive +1 which people can assume is refreshed post-merge of the parent PR.

andyross · 2025-01-16T01:03:36Z

kernel/pipe.c


 	if (K_TIMEOUT_EQ(timeout, K_NO_WAIT)) {
 		return -EAGAIN;
 	}

 	pipe->waiting++;
 	SYS_PORT_TRACING_OBJ_FUNC_BLOCKING(k_pipe, read, pipe, timeout);
-	z_pend_curr(&pipe->lock, *key, waitq, timeout);
+	rc = z_pend_curr(&pipe->lock, *key, waitq, timeout);


To clarify: I'm not a fan of promiscuous use of z_thread_return_value_set as it usually pretty confusing, wastes bytes in the thread struct, and tends to just recapitulate a bunch of state that the IPC mechanisms need to keep anyway (i.e. you generally need to have some sideband to tell you "why you woke up" and this is just a duplicate). My hope in the long term is that pend() ends up just flagging a boolean result: either you were deliberately awoken (currently a return value of zero), or you bounced for some async reason (and need to return -EAGAIN to your caller).

And in fact that's all this is doing: it just does the wakeup with a result code of zero, and checks for the -EAGAIN which is the default value. So while this is using a feature I personally dislike, it's doing it in a way that would port cleanly to the eventual utopia I imagine.

andyross · 2025-01-16T01:04:16Z

kernel/pipe.c

@@ -110,7 +100,7 @@ int z_impl_k_pipe_write(struct k_pipe *pipe, const uint8_t *data, size_t len, k_
 		}

 		if (pipe_empty(pipe)) {
-			notify_waiter(&pipe->data);
+			z_sched_wake(&pipe->data, 0, NULL);


Layering. The former is synchronized and manages the timeout value and some other bits of state you need to worry about when you wake a thread up.

peter-mitsis

I may have missed some subtleties (won't be the first or last time), but pending additional info there are I think a few issues with this proposed patch set.

Shouldn't use of z_sched_wake() necessitate a call to z_reschedule() or one of its variants? The thread that is being awakened could be of higher priority than the current thread, so I would think that we would want to switch to it as soon as we can safely do so rather than rely upon some schedule point external to the pipe operation.
The way I read the pipe reading code, it looks like we can wind up in a situation where we have a non-full pipe buffer and waiting writers. Imagine a full pipe buffer (say 100 bytes) with two waiting writers (that each want to write 1 byte) and we read exactly 100 bytes. The first writer will wake, the pipe buffer will be read, eventually the awakened writer will write its 1 byte to the pipe buffer, but no one wakes the 2nd writer.

Simplify the logic, avoid repeated conditionals, avoid superfluous scheduler calls, make the code more efficient and easier to read. Signed-off-by: Nicolas Pitre <[email protected]>

Dispense with the call to sys_timepoint_expired() by leveraging swap_retval to distinguish between notifications and timeouts when z_pend_curr() returns. Signed-off-by: Nicolas Pitre <[email protected]>

If there are pending readers, it is best to perform a single data copy directly into their final destination buffer rather than doing one copy into the ring buffer just to immediately copy the same data out of it. Incidentally, this allows for supporting pipes with no ring buffer at all. The pipe implementation being deprecated has a similar capability so better have it here too. Signed-off-by: Nicolas Pitre <[email protected]>

... now that the new pipe implementation supports it. Signed-off-by: Nicolas Pitre <[email protected]>

npitre · 2025-01-17T23:02:00Z

both issues were real and should now be fixed.

peter-mitsis · 2025-01-18T00:46:30Z

I think that we are almost there. One more thing that I noticed is that the pipe_buf_spec is stored on the stack. This can be an issue on SMP platforms that do not have a coherent cache for the stack (xtensa). This is why the old pipes implementation had a copy of the pipe descriptor embedded within the thread structure (and swap_data pointed to it). I have some more thinking to do, but there might be some potential behavioral issues or asymmetries if k_pipe_read() is called from within an ISR and there are multiple writers waiting.

npitre · 2025-01-18T01:10:58Z

One more thing that I noticed is that the pipe_buf_spec is stored on the
stack. This can be an issue on SMP platforms that do not have a coherent
cache for the stack (xtensa).

Can it be made coherent manually? The thread owning the stack is guaranteed
not to execute while other threads/CPUs access it.

This also means Xtensa cannot read from a pipe to a buffer on the stack either.

One possibility is to simply disable the direct copy path on Xtensa.

I have some more thinking to do, but there might be some potential
behavioral issues or asymmetries if k_pipe_read() is called from within
an ISR and there are multiple writers waiting.

k_pipe_read() from within an ISR cannot sleep. It may only consume data from
the ring buffer if any, and if the ring buffer was full it wakes up
potential waiting writers, and returns immediately.

andyross · 2025-01-18T16:01:17Z

FWIW it's not Xtensa as a whole, it's specifically intel_adsp's SMP configurations. The architecture actually has a tunable for snoop-enabled cache, but I'm not aware of anyone that's used it.

And yes: it's best to just gate the feature on !defined(CONFIG_KERNEL_COHERENCE). It's possible to manually manage the cache, but very non-trivial. Not only is it incoherent, there's no bytewise tagging: so if one side writes one byte of a 64 byte cache line and ignores the other 63, a future flush can still clobber work other contexts have done. So you have to align and pad everything. And the cache line size changes between cores (the MTK ones I'm working with right now have 128 byte lines!). Not really worth it.

Honestly that's sort of my feeling for all the zero-copy work in our existing IPC primitives. Feels like a lot of code for not a lot of benefit. Apps that care about performance on that level should probably be passing pointers and queuing them with semaphores. Using a pipe almost always means you're willing to deal with some inefficiency in order to get a clean API.

We are waking up threads but failed to let them run if they are higher priority. Add missing calls to z_reschedule(). Also wake up all pending writers as we don't know how many there might be. It is more efficient to wake them all when the ring buffer is full before reading from it rather than waking them one by one whenever there is more room in it. Thanks to Peter Mitsis for noticing those issues. Signed-off-by: Nicolas Pitre <[email protected]>

Systems that enabled this option don't have their stacks in coherent memory. Given our pipe_buf_spec is stored on the stack, and readers may also have their destination buffer on their stack too, it is not worth going to the trouble of supporting direct-to-readers copy with them. Signed-off-by: Nicolas Pitre <[email protected]>

npitre · 2025-01-18T18:15:44Z

CONFIG_KERNEL_COHERENCE awareness added.

Mattemagikern · 2025-01-19T08:30:08Z

add back the ability to do direct writer-to-reader transfers and no-ring-buffer operations as the deprecated code did.

Do we really want to do this?
Traditionally, pipes serve as buffers between producers and consumers, ensuring a clear separation of concerns. A copy-based implementation guarantees consistent semantics regardless of producer/consumer timing or priority, making it the expected behavior of a pipe.

While the NULL buffer option gives users more flexibility, we should ask whether this optimization is truly worth it and widely applicable. Would it see enough real-world use to justify the added complexity? More importantly, are there better alternatives that should be used instead?

By not offering this option, we might actually guide users toward a more suitable data structure for zero-copy communication, rather than complicating the pipe abstraction.

I'm not necessarily opposed to this change—I just want to make sure we've fully considered the implications.

zephyrbot requested review from andyross, attie-argentum, ceolin, cfriedt, dcpleung, kgugala, mnkp, nandojve, nashif, pdgendt, peter-mitsis, stephanosio, TaiJuWu, teburd and tgorochowik January 15, 2025 19:54

zephyrbot assigned andyross and peter-mitsis Jan 15, 2025

zephyrbot added manifest manifest-percepio labels Jan 15, 2025

Mattemagikern reviewed Jan 15, 2025

View reviewed changes

andyross previously approved these changes Jan 16, 2025

View reviewed changes

peter-mitsis reviewed Jan 17, 2025

View reviewed changes

peter-mitsis mentioned this pull request Jan 17, 2025

kernel: k_pipe: Rewrite k_pipe API #83283

Merged

Nicolas Pitre added 4 commits January 17, 2025 16:22

kernel/pipe: code flow cleanup

75b85f7

Simplify the logic, avoid repeated conditionals, avoid superfluous scheduler calls, make the code more efficient and easier to read. Signed-off-by: Nicolas Pitre <[email protected]>

kernel/pipe: remove extra hardware timer query on thread wakeup

5a3495c

Dispense with the call to sys_timepoint_expired() by leveraging swap_retval to distinguish between notifications and timeouts when z_pend_curr() returns. Signed-off-by: Nicolas Pitre <[email protected]>

tests: app_kernel: restore the PIPE_NOBUFF variant

62846fa

... now that the new pipe implementation supports it. Signed-off-by: Nicolas Pitre <[email protected]>

npitre dismissed andyross’s stale review via 9a99c9c January 17, 2025 23:00

npitre force-pushed the pipe branch from 1b371b6 to 9a99c9c Compare January 17, 2025 23:00

zephyrbot added area: Benchmarks area: Kernel and removed manifest manifest-percepio labels Jan 17, 2025

Nicolas Pitre added 2 commits January 18, 2025 13:14

npitre force-pushed the pipe branch from 9a99c9c to ddd1414 Compare January 18, 2025 18:14

cfriedt approved these changes Jan 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k_pipe: add-ons to the k_pipe API rewrite #84052

k_pipe: add-ons to the k_pipe API rewrite #84052

npitre commented Jan 15, 2025 •

edited

Loading

zephyrbot commented Jan 15, 2025 •

edited

Loading

Mattemagikern Jan 15, 2025

andyross Jan 16, 2025

Mattemagikern Jan 15, 2025

andyross Jan 16, 2025

andyross left a comment

andyross Jan 16, 2025

andyross Jan 16, 2025

peter-mitsis left a comment

npitre commented Jan 17, 2025 via email

peter-mitsis commented Jan 18, 2025

npitre commented Jan 18, 2025

andyross commented Jan 18, 2025

npitre commented Jan 18, 2025 via email

Mattemagikern commented Jan 19, 2025

k_pipe: add-ons to the k_pipe API rewrite #84052

Are you sure you want to change the base?

k_pipe: add-ons to the k_pipe API rewrite #84052

Conversation

npitre commented Jan 15, 2025 • edited Loading

zephyrbot commented Jan 15, 2025 • edited Loading

Mattemagikern Jan 15, 2025

Choose a reason for hiding this comment

andyross Jan 16, 2025

Choose a reason for hiding this comment

Mattemagikern Jan 15, 2025

Choose a reason for hiding this comment

andyross Jan 16, 2025

Choose a reason for hiding this comment

andyross left a comment

Choose a reason for hiding this comment

andyross Jan 16, 2025

Choose a reason for hiding this comment

andyross Jan 16, 2025

Choose a reason for hiding this comment

peter-mitsis left a comment

Choose a reason for hiding this comment

npitre commented Jan 17, 2025 via email

peter-mitsis commented Jan 18, 2025

npitre commented Jan 18, 2025

andyross commented Jan 18, 2025

npitre commented Jan 18, 2025 via email

Mattemagikern commented Jan 19, 2025

npitre commented Jan 15, 2025 •

edited

Loading

zephyrbot commented Jan 15, 2025 •

edited

Loading