kernel: k_pipe: Rewrite k_pipe API #83283

Mattemagikern · 2024-12-20T14:49:41Z

The k_pipe_* API has been reworked to provide a more consistent and
intuitive interface. The new API aims to provide a simple to use byte
stream interface that is more in line with the POSIX pipe API.
The previous API has been deprecated and will be removed in a future
release.

Signed-off-by: Måns Ansgariusson [email protected]

andyross

This looks like a pretty solid path to me. Some notes after a very quick read-through, the big ones being "fix the races" and "use ring_buf if possible". You're also going to (obviously) need test coverage for all the cases here, including ideally a stress that will bang pipes with variant read/write/buffer sizes from multiple threads on each end.

The process and architecture people will want to chime in as to migration strategy. As you'll discover I'm a lot (like... a lot) more tolerant of churn and cowboy evolution than most of the project. My guess is that people are going to demand a more conservative rollout. Maybe give the two APIs distinct names and let them live in the tree simultaneously for a version before flagging the old stuff deprecated. Put each under kconfigs such that they can be independently en/disabled, etc...

Also recognize that treewide deprecation means combing through all the in-tree code to make sure that nothing else is using the old layers, deleting or filtering test cases using the wrong stuff, etc... It's tedious and annoying and everyone will hate you when their precious out-of-tree code stops building; just be prepared. :)

include/zephyr/kernel.h

kernel/pipe.c

andyross · 2024-12-20T15:51:07Z

kernel/pipe.c

+		pipe->tail = 0U;
+		pipe->count = 0U;
+
+		// or -ECONNRESET ?


Heh. You do seem to be trawling pretty deep into errno.h to try to find stuff I haven't seen before. FWIW, I think that's usually more harm than good unless you're extremely careful about documenting what each code means and how a user is expected to react. 99%+ of the time -EINVAL or -EAGAIN is all that's required.

I’ve made sure to document the behavior carefully to address potential ambiguity.
In this case, the use of this specific error code is intended for a scenario where we want to signal to a consumer thread that the current work order or process is no longer valid, without needing to close the pipe or disrupt the threads that rely on it. I believe this is a valid use case that justifies going beyond the more common -EINVAL or -EAGAIN.

include/zephyr/kernel.h

Mattemagikern · 2024-12-20T16:34:22Z

Thanks for your initial review @andyross!

I see an oportunity here I'd like to explore; you mention to use the ring_buf implementation instead of the one I have in this patch. I'm all for it, but when I'm already re-using: k_sem.
If we can live with a small overhead in the k_pipe implementation we could piggyback on k_sem interface to have the scheduler management and locking mechanisms be managed. But perhaps the overhead is too much?

andyross · 2024-12-20T18:53:12Z

That seems like an eminently reasonable choice too. Though FWIW if you have a pipe based on pre-existing primitives then its should probably live in lib/ somewhere and not kernel/. Whether the overhead is too high is sort of an app question. As it happens k_sem is really pretty lean and optimal, and I'd guess that kind of tool would benchmark faster than the pipe we have already. But surely you could do better at the bottom of the stack.

I love optimization, but don't know that we should demand it.

peter-mitsis · 2024-12-20T19:07:33Z

In addition to the points that Andy raised, there are a couple of implications that are worth pointing out related to the copying of data.

Copying always goes from write buffer -> pipe buffer -> read buffer. One implication of this in the current proposal is that transfers can not exceed the size of the pipe buffer. This is not an unreasonable size limitation, but it should be documented.
Data completion and timeouts- part 1. There is a subtle change in how timeouts and data completion are being handled here as compared to the other data passing kernel objects. The others pend until the data is completely sent/received, while this new pipes implementation pend only until there is a chance that data can be sent/received.
Data completion and timeouts - part 2. An indeterminate amount of time may elapse between the time at which the point the reader/writer is woken up and the time at which it executes to copy data into/out of the pipe buffer. This unknown amount of time may exceed the timeout. This subsequently raises the question ... should it copy the data? This could be further complicated should someone want to copy more data than will fit into the pipe buffer as both sender and receiver get swapped in and out to complete the whole transfer.

Mattemagikern · 2024-12-20T21:42:39Z

@andyross

But surely you could do better at the bottom of the stack.

I’ve really enjoyed the learning experience of working with the scheduler, particularly with the wait_q, pend, and unpend functions. I’m also happy to put in the effort to make it as lean and efficient as possible. A small amount of effort here will pay dividends if the structure becomes useful to others.

@peter-mitsis

Agreed, it should be documented.
I started trying to answer 2. and 3. but my brain is fried atm. I'll sleep on the not so simple questions/concerns ^^

npitre

A few comments:

This was mentioned already, but you should consider using the ring_buffer
facility rather than open coding your own. This will save some binary
footprint not to have 2 different implementations linked in. Furthermore,
the most costly part in yours is by far the modulus operator on some targets
which the generic code dispense with.

You say:

I looked into the ring_buffer.c and it looks like it is two things in
one? Both an object queue and a ring_buffer.c for bytes.

Not sure what you mean here, however if it is the ring_buf_item_* stuff
that looks confusing then you may safely ignore it. And better look in
ring_buffer.h first instead.

The size of the ring_buffer struct alone would almost double the size of
the k_pipe object so I opted to not use it.

Given those fields you will no longer need, the k_pipe object would grow by
only 16 bytes. Unless you are using a gazillon of pipes, the tradeoff is in
favor of code reuse both in terms of binary size and maintainability.

(Since I almost had all the logic in place for the ring-buffer already).

Sorry but such arguments don't count. ;-)

Next, the pipe implementation itself is lacking. If you want to write
10 bytes but there is room for only 5 bytes then you should write those 5
bytes, flag any waiters, and wait up to the timeout to write the remaining 5
bytes. If suddenly you're woken up and there is only room for 3 bytes then
you should write those 3 bytes, flag potential waiters, readjust the timeout
for what remains of the original one, and wait again for the last 2 bytes.
Same logic goes for the reader. Currently you return early when there is not
enough room despite the timeout not being reached.

Timeout readjusting is tricky. You may look at sys_timepoint_timeout() usage
in kheap.c for an example of how to do it simply.

Mattemagikern · 2024-12-21T13:09:03Z

@peter-mitsis

Data completion and timeouts- part 1. There is a subtle change in how timeouts and data completion are being handled here as compared to the other data passing kernel objects. The others pend until the data is completely sent/received, while this new pipes implementation pend only until there is a chance that data can be sent/received.

The distinction you’ve pointed out between this implementation and other kernel objects is intentional. By waking up as soon as there’s a chance to send or receive data, it aims to maximize concurrency, reduce blocking, and decrease complexity. I understand that this behavior deviates from existing patterns, which might cause confusion or misaligned expectations. However, this new behavior aligns closely with the concept of a byte-stream, which I believe makes it more intuitive for certain use cases.

Data completion and timeouts - part 2. An indeterminate amount of time may elapse between the time at which the point the reader/writer is woken up and the time at which it executes to copy data into/out of the pipe buffer. This unknown amount of time may exceed the timeout. This subsequently raises the question ... should it copy the data? This could be further complicated should someone want to copy more data than will fit into the pipe buffer as both sender and receiver get swapped in and out to complete the whole transfer.

The issue of indeterminate time elapsing between wakeup and execution is indeed a valid concern, especially when considering scenarios where the timeout might expire during this window. The current design assumes that the timeout applies to the operation's initiation, not its completion. Achieving precise timing for both initiation and completion would be challenging. A possible middle ground might be to document the timeout behavior explicitly in the API to set clear expectations for users.

Regarding data transfers larger than the pipe buffer, this scenario is effectively handled by allowing operations to perform partial reads and writes. This mechanism ensures that even when the data size exceeds the buffer capacity, the transfer can proceed incrementally until completion.

Mattemagikern · 2024-12-21T13:31:44Z

@npitre

Sorry but such arguments don't count. ;-)

Ahaha Fair! I’ve taken another look at the implementation and made revisions to utilize the ring_buffer as suggested. While I initially hesitated due to the additional overhead, I agree that the tradeoff in terms of binary size and maintainability justifies the change. Reusing existing infrastructure is the better long-term choice.

Regarding the ring_buffer_item_* functions, I see how they could be perceived as adding unnecessary complexity, given their similarity to a message queue without synchronization. If they were refactored into their own type or removed, it might streamline the codebase further. However, this isn’t directly within the scope of this patch, so for now, I’ll focus on adapting the ring_buffer as it stands. :)

Next, the pipe implementation itself is lacking. If you want to write
10 bytes but there is room for only 5 bytes then you should write those 5
bytes, flag any waiters, and wait up to the timeout to write the remaining 5
bytes. If suddenly you're woken up and there is only room for 3 bytes then
you should write those 3 bytes, flag potential waiters, readjust the timeout
for what remains of the original one, and wait again for the last 2 bytes.
Same logic goes for the reader. Currently you return early when there is not
enough room despite the timeout not being reached.

While I understand the reasoning behind partial writes and adjusting the timeout dynamically, I believe that returning early when there is insufficient space aligns better with the design goals of simplicity and efficiency. Continuously retrying with adjusted timeouts adds complexity (and possibly overhead), which might not be necessary for most use cases. Instead, handling partial writes and managing timeouts at a higher level can keep the implementation cleaner, more predictable and easier to test.

Timeout readjusting is tricky. You may look at sys_timepoint_timeout() usage
in kheap.c for an example of how to do it simply.

Thank you for the advice, I'll keep it in mind depending on the outcome of this discussion :)

Mattemagikern · 2024-12-21T20:19:35Z

Something that struck me when I made the move to ring_buffer:

The ring_buffer implementation is not enabled by default. To use it, you need to set the Kconfig option: CONFIG_RING_BUFFER. Since kernel objects are expected to be enabled by default, I see the following options:

Keep k_pipe in the kernel: Enable the ring_buffer implementation by default.
Move k_pipe to lib/: Leave the ring_buffer configuration unchanged, with Kconfig being implicitly set.
Keep k_pipe in the kernel: Implement the ring_buffer functionality directly within k_pipe, or create a shared ring_buffer for kernel objects, with k_msg_q and k_pipe as potential use cases.

I recommend keeping k_pipe in the kernel, as it is the only byte-stream implementation available. Depending on the optimization requirements, we can either:

Keep the ring_buffer implementation, which would introduce 16 bytes of overhead to k_pipe, or
Revert to the previous implementation and integrate the ring_buffer functionality directly into k_pipe, potentially duplicating a subset of the current ring_buffer logic.

What are your thoughts on this?

andyross · 2024-12-21T21:31:07Z

There's already a CONFIG_PIPES that gets the pipe code, just have that "select RING_BUFFER". It's true that our kconfigs are sometimes finer grained than they should be, but there's no shame in having the kernel depend on code in lib/ that's what it's there for.

npitre · 2024-12-21T21:34:16Z

What are your thoughts on this?

Many things rely on ring_buffer.c already, so I'm rather surprized it is not always compiled. I'd say it is light and quick to compile so no real gain in leaving it out, given that bitarray.c is even bigger and always built.

Mattemagikern · 2024-12-22T18:36:04Z

I propose removing the k_pipe_write_avail and k_pipe_read_avail functions from the new k_pipe interface. These functions are problematic because their results are not guaranteed to remain valid, as the calling process does not hold a lock on the k_pipe struct. This lack of synchronization introduces potential race conditions, making the returned values unreliable in concurrent scenarios.

Including these functions in the interface suggests that their results can be relied upon for decision-making, which risks subtle bugs or misuse. If their intended purpose is to provide a "snapshot" for non-critical use cases, their utility seems limited.

For a more robust and intuitive interface, I recommend removing these functions entirely, as doing so would simplify the API and reduce potential confusion for users.

That said, these functions are currently used in subsys/net/lib/sockets/socketpair.c and kernel/poll.c. The socketpair implementation should naturally adapt to the new read/write functions. However, the implications for poll.c are less clear and may require further consideration.

andyross

Took another pass through. Looking cleaner now. The ring_buf usage looks correct and straightforward. To repeat though: I'm absolutely not above appreciating a rewritten data structure, but if you do it you need to show clear and obvious advantages. :)

As others are pointing out, the semantics of short counts are a little confusing. Zephyr IPC almost always takes timeout arguments, so ideally this code should sit in a loop trying to iteratively read/write until the timeout expires instead of returning on the first byte.

Code that actually does want to see the partial results (which is usually a rare edge case kind of thing, the whole idea of "file descriptor" style I/O is that it synthesizes a synchronous interface over async hardware behavior) can always try to read one byte as a sentinel and follow it with a call using K_NO_WAIT to grab the rest.

kernel/pipe.c

kernel/pipes.c

zephyrbot · 2025-01-15T14:50:04Z

The following west manifest projects have changed revision in this Pull Request:

Name	Old Revision	New Revision	Diff
percepio	zephyrproject-rtos/percepio@`0d44033` (`zephyr`)	zephyrproject-rtos/percepio@`388c293` (`zephyr_v4_10_2_hotfix1`)	zephyrproject-rtos/[email protected]

✅ All manifest checks OK

Note: This message is automatically posted and updated by the Manifest GitHub Action.

Add tracing support for the reworked k_pipe API. Signed-off-by: Måns Ansgariusson <[email protected]>

The struct ring_buf is renamed to struct ring_buffer to be able to coexist with the sys/ring_buffer.h header file. Signed-off-by: Måns Ansgariusson <[email protected]>

The smp_shell module uses k_fifo, but does not include the kernel header file that defines it. This commit adds the missing include. Signed-off-by: Måns Ansgariusson <[email protected]>

Replaced the k_pipe-based implementation in sockpair with ring_buffer based implementation instead. The move to ring_buffer is done to avoid overhead of k_pipe and to align with the new k_pipe API. This does not pose any added risk to concurrency as the read and write functions are protected by semaphores for both spairs. Signed-off-by: Måns Ansgariusson <[email protected]>

Update tests to use the reworked k_pipe API. Signed-off-by: Måns Ansgariusson <[email protected]>

Mattemagikern · 2025-01-15T15:04:35Z

Good evening, everyone!

Thanks to @eriktamlin at Percepio, the integration tests for the modules/debug/percepio trace functionality should now pass. Unless something unexpected arises during this test run, I anticipate all tests will complete successfully.
Please also note the change to the west.yml file in the tracing commit, where Zephyr now points to the branch containing the updates made by @eriktamlin.

I believe all feedback has been addressed at this stage. I’m switching the PR from Draft to Ready for Review. Please let me know if there’s anything further that needs attention.

Thank you for your time and input!

kartben · 2025-01-17T18:45:01Z

@Mattemagikern please update #80539 to track deprecation
Also, please follow-up with a PR updating the release notes to call out this deprecation
Thanks!

peter-mitsis · 2025-01-17T20:53:58Z

I approved too soon I think. I realized something after reviewing the extension to this by @npitre (see #84052). At some point after the waiter is notified, we need to invoke z_reschedule() to force a schedule point. Otherwise we are relying upon some external action to force a reschedule (such as an interrupt, pending on an object, ...)

npitre · 2025-01-17T21:11:20Z

I'll add the `z_reschedule()` to my follo-up patches.

andyross reviewed Dec 20, 2024

View reviewed changes

Mattemagikern force-pushed the k_pipe branch from e46dad2 to a7b74f2 Compare December 20, 2024 16:11

andyross requested review from npitre, peter-mitsis, nashif, carlocaione, henrikbrixandersen, carlescufi, dcpleung, fabiobaltieri and yperess December 20, 2024 16:18

Mattemagikern force-pushed the k_pipe branch from a7b74f2 to 17a7e7e Compare December 20, 2024 20:39

npitre reviewed Dec 20, 2024

View reviewed changes

Mattemagikern force-pushed the k_pipe branch from 17a7e7e to 1357469 Compare December 21, 2024 12:41

Mattemagikern force-pushed the k_pipe branch 4 times, most recently from aa754c0 to 814edfd Compare December 23, 2024 13:23

andyross reviewed Dec 23, 2024

View reviewed changes

kernel/pipe.c Outdated Show resolved Hide resolved

kernel/pipe.c Outdated Show resolved Hide resolved

kernel/pipes.c Outdated Show resolved Hide resolved

Mattemagikern force-pushed the k_pipe branch from 814edfd to c1f63fa Compare December 23, 2024 14:16

Mattemagikern force-pushed the k_pipe branch from 3a81c01 to 8b4ec07 Compare January 15, 2025 14:49

zephyrbot added manifest manifest-percepio labels Jan 15, 2025

Mattemagikern added 6 commits January 15, 2025 15:55

tracing: k_pipe: Add tracing support for reworked k_pipe API

2a34277

Add tracing support for the reworked k_pipe API. Signed-off-by: Måns Ansgariusson <[email protected]>

drivers: ethernet: Rename struct ring_buf -> struct ring_buffer

bb3a78f

The struct ring_buf is renamed to struct ring_buffer to be able to coexist with the sys/ring_buffer.h header file. Signed-off-by: Måns Ansgariusson <[email protected]>

drivers: i2s: Rename struct ring_buf -> struct ring_buffer

505fac0

The struct ring_buf is renamed to struct ring_buffer to be able to coexist with the sys/ring_buffer.h header file. Signed-off-by: Måns Ansgariusson <[email protected]>

smp_shell: Add missing include for k_fifo

8d76265

The smp_shell module uses k_fifo, but does not include the kernel header file that defines it. This commit adds the missing include. Signed-off-by: Måns Ansgariusson <[email protected]>

tests: Update tests to use new k_pipe API

8b4ec07

Update tests to use the reworked k_pipe API. Signed-off-by: Måns Ansgariusson <[email protected]>

Mattemagikern marked this pull request as ready for review January 15, 2025 15:04

Mattemagikern requested review from cfriedt, peter-mitsis, andyross, npitre and nashif January 15, 2025 15:05

zephyrbot requested review from ceolin, TaiJuWu and teburd January 15, 2025 15:05

zephyrbot assigned andyross and peter-mitsis Jan 15, 2025

npitre approved these changes Jan 15, 2025

View reviewed changes

npitre mentioned this pull request Jan 15, 2025

k_pipe: add-ons to the k_pipe API rewrite #84052

Open

cfriedt approved these changes Jan 17, 2025

View reviewed changes

peter-mitsis approved these changes Jan 17, 2025

View reviewed changes

kartben merged commit 0572f1f into zephyrproject-rtos:main Jan 17, 2025
38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel: k_pipe: Rewrite k_pipe API #83283

kernel: k_pipe: Rewrite k_pipe API #83283

Mattemagikern commented Dec 20, 2024

andyross left a comment

andyross Dec 20, 2024

Mattemagikern Dec 23, 2024

Mattemagikern commented Dec 20, 2024

andyross commented Dec 20, 2024

peter-mitsis commented Dec 20, 2024

Mattemagikern commented Dec 20, 2024

npitre left a comment

Mattemagikern commented Dec 21, 2024

Mattemagikern commented Dec 21, 2024 •

edited

Loading

Mattemagikern commented Dec 21, 2024

andyross commented Dec 21, 2024

npitre commented Dec 21, 2024 via email

Mattemagikern commented Dec 22, 2024

andyross left a comment

zephyrbot commented Jan 15, 2025

Mattemagikern commented Jan 15, 2025 •

edited

Loading

kartben commented Jan 17, 2025 •

edited

Loading

peter-mitsis commented Jan 17, 2025

npitre commented Jan 17, 2025 via email

kernel: k_pipe: Rewrite k_pipe API #83283

kernel: k_pipe: Rewrite k_pipe API #83283

Conversation

Mattemagikern commented Dec 20, 2024

andyross left a comment

Choose a reason for hiding this comment

andyross Dec 20, 2024

Choose a reason for hiding this comment

Mattemagikern Dec 23, 2024

Choose a reason for hiding this comment

Mattemagikern commented Dec 20, 2024

andyross commented Dec 20, 2024

peter-mitsis commented Dec 20, 2024

Mattemagikern commented Dec 20, 2024

npitre left a comment

Choose a reason for hiding this comment

Mattemagikern commented Dec 21, 2024

Mattemagikern commented Dec 21, 2024 • edited Loading

Mattemagikern commented Dec 21, 2024

andyross commented Dec 21, 2024

npitre commented Dec 21, 2024 via email

Mattemagikern commented Dec 22, 2024

andyross left a comment

Choose a reason for hiding this comment

zephyrbot commented Jan 15, 2025

Mattemagikern commented Jan 15, 2025 • edited Loading

kartben commented Jan 17, 2025 • edited Loading

peter-mitsis commented Jan 17, 2025

npitre commented Jan 17, 2025 via email

Mattemagikern commented Dec 21, 2024 •

edited

Loading

Mattemagikern commented Jan 15, 2025 •

edited

Loading

kartben commented Jan 17, 2025 •

edited

Loading