Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel: k_pipe: Rewrite k_pipe API #83283

Merged
merged 11 commits into from
Jan 17, 2025
Merged

Conversation

Mattemagikern
Copy link
Contributor

The k_pipe_* API has been reworked to provide a more consistent and
intuitive interface. The new API aims to provide a simple to use byte
stream interface that is more in line with the POSIX pipe API.
The previous API has been deprecated and will be removed in a future
release.

Signed-off-by: Måns Ansgariusson [email protected]

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a pretty solid path to me. Some notes after a very quick read-through, the big ones being "fix the races" and "use ring_buf if possible". You're also going to (obviously) need test coverage for all the cases here, including ideally a stress that will bang pipes with variant read/write/buffer sizes from multiple threads on each end.

The process and architecture people will want to chime in as to migration strategy. As you'll discover I'm a lot (like... a lot) more tolerant of churn and cowboy evolution than most of the project. My guess is that people are going to demand a more conservative rollout. Maybe give the two APIs distinct names and let them live in the tree simultaneously for a version before flagging the old stuff deprecated. Put each under kconfigs such that they can be independently en/disabled, etc...

Also recognize that treewide deprecation means combing through all the in-tree code to make sure that nothing else is using the old layers, deleting or filtering test cases using the wrong stuff, etc... It's tedious and annoying and everyone will hate you when their precious out-of-tree code stops building; just be prepared. :)

include/zephyr/kernel.h Outdated Show resolved Hide resolved
include/zephyr/kernel.h Outdated Show resolved Hide resolved
include/zephyr/kernel.h Outdated Show resolved Hide resolved
kernel/pipe.c Outdated Show resolved Hide resolved
kernel/pipe.c Outdated Show resolved Hide resolved
kernel/pipe.c Outdated Show resolved Hide resolved
kernel/pipe.c Outdated Show resolved Hide resolved
kernel/pipe.c Outdated Show resolved Hide resolved
kernel/pipe.c Outdated
pipe->tail = 0U;
pipe->count = 0U;

// or -ECONNRESET ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh. You do seem to be trawling pretty deep into errno.h to try to find stuff I haven't seen before. FWIW, I think that's usually more harm than good unless you're extremely careful about documenting what each code means and how a user is expected to react. 99%+ of the time -EINVAL or -EAGAIN is all that's required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve made sure to document the behavior carefully to address potential ambiguity.
In this case, the use of this specific error code is intended for a scenario where we want to signal to a consumer thread that the current work order or process is no longer valid, without needing to close the pipe or disrupt the threads that rely on it. I believe this is a valid use case that justifies going beyond the more common -EINVAL or -EAGAIN.

include/zephyr/kernel.h Outdated Show resolved Hide resolved
@Mattemagikern
Copy link
Contributor Author

Thanks for your initial review @andyross!

I see an oportunity here I'd like to explore; you mention to use the ring_buf implementation instead of the one I have in this patch. I'm all for it, but when I'm already re-using: k_sem.
If we can live with a small overhead in the k_pipe implementation we could piggyback on k_sem interface to have the scheduler management and locking mechanisms be managed. But perhaps the overhead is too much?

@andyross
Copy link
Contributor

That seems like an eminently reasonable choice too. Though FWIW if you have a pipe based on pre-existing primitives then its should probably live in lib/ somewhere and not kernel/. Whether the overhead is too high is sort of an app question. As it happens k_sem is really pretty lean and optimal, and I'd guess that kind of tool would benchmark faster than the pipe we have already. But surely you could do better at the bottom of the stack.

I love optimization, but don't know that we should demand it.

@peter-mitsis
Copy link
Collaborator

In addition to the points that Andy raised, there are a couple of implications that are worth pointing out related to the copying of data.

  1. Copying always goes from write buffer -> pipe buffer -> read buffer. One implication of this in the current proposal is that transfers can not exceed the size of the pipe buffer. This is not an unreasonable size limitation, but it should be documented.
  2. Data completion and timeouts- part 1. There is a subtle change in how timeouts and data completion are being handled here as compared to the other data passing kernel objects. The others pend until the data is completely sent/received, while this new pipes implementation pend only until there is a chance that data can be sent/received.
  3. Data completion and timeouts - part 2. An indeterminate amount of time may elapse between the time at which the point the reader/writer is woken up and the time at which it executes to copy data into/out of the pipe buffer. This unknown amount of time may exceed the timeout. This subsequently raises the question ... should it copy the data? This could be further complicated should someone want to copy more data than will fit into the pipe buffer as both sender and receiver get swapped in and out to complete the whole transfer.

@Mattemagikern
Copy link
Contributor Author

@andyross

But surely you could do better at the bottom of the stack.

I’ve really enjoyed the learning experience of working with the scheduler, particularly with the wait_q, pend, and unpend functions. I’m also happy to put in the effort to make it as lean and efficient as possible. A small amount of effort here will pay dividends if the structure becomes useful to others.

@peter-mitsis

  1. Agreed, it should be documented.
    I started trying to answer 2. and 3. but my brain is fried atm. I'll sleep on the not so simple questions/concerns ^^

Copy link
Collaborator

@npitre npitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments:

This was mentioned already, but you should consider using the ring_buffer
facility rather than open coding your own. This will save some binary
footprint not to have 2 different implementations linked in. Furthermore,
the most costly part in yours is by far the modulus operator on some targets
which the generic code dispense with.

You say:

I looked into the ring_buffer.c and it looks like it is two things in
one? Both an object queue and a ring_buffer.c for bytes.

Not sure what you mean here, however if it is the ring_buf_item_* stuff
that looks confusing then you may safely ignore it. And better look in
ring_buffer.h first instead.

The size of the ring_buffer struct alone would almost double the size of
the k_pipe object so I opted to not use it.

Given those fields you will no longer need, the k_pipe object would grow by
only 16 bytes. Unless you are using a gazillon of pipes, the tradeoff is in
favor of code reuse both in terms of binary size and maintainability.

(Since I almost had all the logic in place for the ring-buffer already).

Sorry but such arguments don't count. ;-)

Next, the pipe implementation itself is lacking. If you want to write
10 bytes but there is room for only 5 bytes then you should write those 5
bytes, flag any waiters, and wait up to the timeout to write the remaining 5
bytes. If suddenly you're woken up and there is only room for 3 bytes then
you should write those 3 bytes, flag potential waiters, readjust the timeout
for what remains of the original one, and wait again for the last 2 bytes.
Same logic goes for the reader. Currently you return early when there is not
enough room despite the timeout not being reached.

Timeout readjusting is tricky. You may look at sys_timepoint_timeout() usage
in kheap.c for an example of how to do it simply.

@Mattemagikern
Copy link
Contributor Author

@peter-mitsis

  1. Data completion and timeouts- part 1. There is a subtle change in how timeouts and data completion are being handled here as compared to the other data passing kernel objects. The others pend until the data is completely sent/received, while this new pipes implementation pend only until there is a chance that data can be sent/received.

The distinction you’ve pointed out between this implementation and other kernel objects is intentional. By waking up as soon as there’s a chance to send or receive data, it aims to maximize concurrency, reduce blocking, and decrease complexity. I understand that this behavior deviates from existing patterns, which might cause confusion or misaligned expectations. However, this new behavior aligns closely with the concept of a byte-stream, which I believe makes it more intuitive for certain use cases.

  1. Data completion and timeouts - part 2. An indeterminate amount of time may elapse between the time at which the point the reader/writer is woken up and the time at which it executes to copy data into/out of the pipe buffer. This unknown amount of time may exceed the timeout. This subsequently raises the question ... should it copy the data? This could be further complicated should someone want to copy more data than will fit into the pipe buffer as both sender and receiver get swapped in and out to complete the whole transfer.

The issue of indeterminate time elapsing between wakeup and execution is indeed a valid concern, especially when considering scenarios where the timeout might expire during this window. The current design assumes that the timeout applies to the operation's initiation, not its completion. Achieving precise timing for both initiation and completion would be challenging. A possible middle ground might be to document the timeout behavior explicitly in the API to set clear expectations for users.

Regarding data transfers larger than the pipe buffer, this scenario is effectively handled by allowing operations to perform partial reads and writes. This mechanism ensures that even when the data size exceeds the buffer capacity, the transfer can proceed incrementally until completion.

@Mattemagikern
Copy link
Contributor Author

Mattemagikern commented Dec 21, 2024

@npitre

Sorry but such arguments don't count. ;-)

Ahaha Fair! I’ve taken another look at the implementation and made revisions to utilize the ring_buffer as suggested. While I initially hesitated due to the additional overhead, I agree that the tradeoff in terms of binary size and maintainability justifies the change. Reusing existing infrastructure is the better long-term choice.

Regarding the ring_buffer_item_* functions, I see how they could be perceived as adding unnecessary complexity, given their similarity to a message queue without synchronization. If they were refactored into their own type or removed, it might streamline the codebase further. However, this isn’t directly within the scope of this patch, so for now, I’ll focus on adapting the ring_buffer as it stands. :)

Next, the pipe implementation itself is lacking. If you want to write
10 bytes but there is room for only 5 bytes then you should write those 5
bytes, flag any waiters, and wait up to the timeout to write the remaining 5
bytes. If suddenly you're woken up and there is only room for 3 bytes then
you should write those 3 bytes, flag potential waiters, readjust the timeout
for what remains of the original one, and wait again for the last 2 bytes.
Same logic goes for the reader. Currently you return early when there is not
enough room despite the timeout not being reached.

While I understand the reasoning behind partial writes and adjusting the timeout dynamically, I believe that returning early when there is insufficient space aligns better with the design goals of simplicity and efficiency. Continuously retrying with adjusted timeouts adds complexity (and possibly overhead), which might not be necessary for most use cases. Instead, handling partial writes and managing timeouts at a higher level can keep the implementation cleaner, more predictable and easier to test.

Timeout readjusting is tricky. You may look at sys_timepoint_timeout() usage
in kheap.c for an example of how to do it simply.

Thank you for the advice, I'll keep it in mind depending on the outcome of this discussion :)

@Mattemagikern
Copy link
Contributor Author

Something that struck me when I made the move to ring_buffer:

The ring_buffer implementation is not enabled by default. To use it, you need to set the Kconfig option: CONFIG_RING_BUFFER. Since kernel objects are expected to be enabled by default, I see the following options:

  • Keep k_pipe in the kernel: Enable the ring_buffer implementation by default.
  • Move k_pipe to lib/: Leave the ring_buffer configuration unchanged, with Kconfig being implicitly set.
  • Keep k_pipe in the kernel: Implement the ring_buffer functionality directly within k_pipe, or create a shared ring_buffer for kernel objects, with k_msg_q and k_pipe as potential use cases.

I recommend keeping k_pipe in the kernel, as it is the only byte-stream implementation available. Depending on the optimization requirements, we can either:

  • Keep the ring_buffer implementation, which would introduce 16 bytes of overhead to k_pipe, or
  • Revert to the previous implementation and integrate the ring_buffer functionality directly into k_pipe, potentially duplicating a subset of the current ring_buffer logic.

What are your thoughts on this?

@andyross
Copy link
Contributor

There's already a CONFIG_PIPES that gets the pipe code, just have that "select RING_BUFFER". It's true that our kconfigs are sometimes finer grained than they should be, but there's no shame in having the kernel depend on code in lib/ that's what it's there for.

@npitre
Copy link
Collaborator

npitre commented Dec 21, 2024 via email

@Mattemagikern
Copy link
Contributor Author

I propose removing the k_pipe_write_avail and k_pipe_read_avail functions from the new k_pipe interface. These functions are problematic because their results are not guaranteed to remain valid, as the calling process does not hold a lock on the k_pipe struct. This lack of synchronization introduces potential race conditions, making the returned values unreliable in concurrent scenarios.

Including these functions in the interface suggests that their results can be relied upon for decision-making, which risks subtle bugs or misuse. If their intended purpose is to provide a "snapshot" for non-critical use cases, their utility seems limited.

For a more robust and intuitive interface, I recommend removing these functions entirely, as doing so would simplify the API and reduce potential confusion for users.

That said, these functions are currently used in subsys/net/lib/sockets/socketpair.c and kernel/poll.c. The socketpair implementation should naturally adapt to the new read/write functions. However, the implications for poll.c are less clear and may require further consideration.

@Mattemagikern Mattemagikern force-pushed the k_pipe branch 4 times, most recently from aa754c0 to 814edfd Compare December 23, 2024 13:23
Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took another pass through. Looking cleaner now. The ring_buf usage looks correct and straightforward. To repeat though: I'm absolutely not above appreciating a rewritten data structure, but if you do it you need to show clear and obvious advantages. :)

As others are pointing out, the semantics of short counts are a little confusing. Zephyr IPC almost always takes timeout arguments, so ideally this code should sit in a loop trying to iteratively read/write until the timeout expires instead of returning on the first byte.

Code that actually does want to see the partial results (which is usually a rare edge case kind of thing, the whole idea of "file descriptor" style I/O is that it synthesizes a synchronous interface over async hardware behavior) can always try to read one byte as a sentinel and follow it with a call using K_NO_WAIT to grab the rest.

kernel/pipe.c Outdated Show resolved Hide resolved
kernel/pipe.c Outdated Show resolved Hide resolved
kernel/pipes.c Outdated Show resolved Hide resolved
@zephyrbot
Copy link
Collaborator

The following west manifest projects have changed revision in this Pull Request:

Name Old Revision New Revision Diff
percepio zephyrproject-rtos/percepio@0d44033 (zephyr) zephyrproject-rtos/percepio@388c293 (zephyr_v4_10_2_hotfix1) zephyrproject-rtos/[email protected]

All manifest checks OK

Note: This message is automatically posted and updated by the Manifest GitHub Action.

Add tracing support for the reworked k_pipe API.

Signed-off-by: Måns Ansgariusson <[email protected]>
The struct ring_buf is renamed to struct ring_buffer to be able to coexist
with the sys/ring_buffer.h header file.

Signed-off-by: Måns Ansgariusson <[email protected]>
The struct ring_buf is renamed to struct ring_buffer to be able to coexist
with the sys/ring_buffer.h header file.

Signed-off-by: Måns Ansgariusson <[email protected]>
The smp_shell module uses k_fifo, but does not include the kernel header
file that defines it. This commit adds the missing include.

Signed-off-by: Måns Ansgariusson <[email protected]>
Replaced the k_pipe-based implementation in sockpair with ring_buffer
based implementation instead.
The move to ring_buffer is done to avoid overhead of k_pipe and to align
with the new k_pipe API.
This does not pose any added risk to concurrency as the read and write
functions are protected by semaphores for both spairs.

Signed-off-by: Måns Ansgariusson <[email protected]>
Update tests to use the reworked k_pipe API.

Signed-off-by: Måns Ansgariusson <[email protected]>
@Mattemagikern
Copy link
Contributor Author

Mattemagikern commented Jan 15, 2025

Good evening, everyone!

Thanks to @eriktamlin at Percepio, the integration tests for the modules/debug/percepio trace functionality should now pass. Unless something unexpected arises during this test run, I anticipate all tests will complete successfully.
Please also note the change to the west.yml file in the tracing commit, where Zephyr now points to the branch containing the updates made by @eriktamlin.

I believe all feedback has been addressed at this stage. I’m switching the PR from Draft to Ready for Review. Please let me know if there’s anything further that needs attention.

Thank you for your time and input!

@kartben kartben merged commit 0572f1f into zephyrproject-rtos:main Jan 17, 2025
38 checks passed
@kartben
Copy link
Collaborator

kartben commented Jan 17, 2025

@Mattemagikern please update #80539 to track deprecation
Also, please follow-up with a PR updating the release notes to call out this deprecation
Thanks!

@peter-mitsis
Copy link
Collaborator

I approved too soon I think. I realized something after reviewing the extension to this by @npitre (see #84052). At some point after the waiter is notified, we need to invoke z_reschedule() to force a schedule point. Otherwise we are relying upon some external action to force a reschedule (such as an interrupt, pending on an object, ...)

@npitre
Copy link
Collaborator

npitre commented Jan 17, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants