Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce socket service API #66758

Merged
merged 6 commits into from
Jan 16, 2024

Conversation

jukkar
Copy link
Member

@jukkar jukkar commented Dec 21, 2023

This PR introduces a socket service API. It can be used to create one socket listener that calls a callback if a set of sockets have any activity. This saves memory because instead of creating multiple threads for a set of sockets, one main thread can listen all needed sockets. This is similar concept how inetd works in Linux, although it is implemented differently.

@jukkar
Copy link
Member Author

jukkar commented Dec 21, 2023

This is still a draft, we can change the APIs etc, or discard the PR if it is not needed. Anyway, please take a look.

@jukkar jukkar changed the title Introduce Socket service API Introduce socket service API Dec 21, 2023
subsys/net/lib/sockets/sockets_service.c Outdated Show resolved Hide resolved
subsys/net/lib/sockets/sockets_service.c Outdated Show resolved Hide resolved
include/zephyr/net/socket_service.h Show resolved Hide resolved
* User should create needed sockets and then setup the poll struct and
* then register the sockets to be monitored at runtime.
*/
struct net_socket_service_desc {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we really need this static configuration for socket service? It seems a bit problematic, at least for my use case - DHCP server. Here, ideally, I'd need to have a single socket for each interface that wants to support the DHCP server functionality. But I don't think we have a way to calculate the number of interfaces in the system at build time to supply the static configuration macros. Purely runtime registration would be less cumbersome in such case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we really need this static configuration for socket service? It seems a bit problematic, at least for my use case - DHCP server. Here, ideally, I'd need to have a single socket for each interface that wants to support the DHCP server functionality. But I don't think we have a way to calculate the number of interfaces in the system at build time to supply the static configuration macros. Purely runtime registration would be less cumbersome in such case.

I originally did some experimentation with runtime configuration, but as the API requires some housekeeping activities, allocating them at runtime requires malloc etc. There is a call to malloc at this PR too but hopefully we could get rid of it.

We could place struct zsock_pollfd into DHCP config in network interface, that way the information this API needs, would already be part of network interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devicetree could help here, just sayin 🤷🏼‍♂️

return -ENOENT;
}

if (svc->work_q != NULL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we call the registred callback directly from the services thread? I see a potential race here - we do submit the work to execute, but don't read the data (in case POLLIN was monitored) from the socket until the work executes. If the workqueue thread has lower prio than the services thread, we may end up busy looping (poll() will keep reporting POLLIN as the work item didn't have a chance to execute).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we call the registred callback directly from the services thread?

I originally planned to do that, but it felt more natural to use the existing APIs and resources in the system. So using a work queue does the same thing. But we can certainly change this, it is just to change one function call.

When using workqueue(s), we can serve (at least theoretically) multiple socket services at the "same" time and the priority of the thread using the services api determines which one gets called first. If we have a callback directly from services thread, then there is no such thing available.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a potential race here - we do submit the work to execute, but don't read the data (in case POLLIN was monitored) from the socket until the work executes.

Note that k_work_submit() does nothing if the work is already pending. It is indeed possible that we might busyloop in the service thread if the service user does not "run" fast enough. I tried to mitigate this problem by setting the service thread priority to lowest application thread priority.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to an issue when calling the "callback" via the workqueue as it might be possible (depending on different thread priorities) that we start to loop through poll() if the callback does not get run properly. So it might be better if we call the callback directly (synchronously) without using the workqueue. I will investigate this more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to an issue when calling the "callback" via the workqueue as it might be possible (depending on different thread priorities) that we start to loop through poll() if the callback does not get run properly. So it might be better if we call the callback directly (synchronously) without using the workqueue. I will investigate this more.

Left the async call but fixed the code so that the callback work is only called once. Fixed also the tests that were not working properly with native_sim.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I'm not personally convinced about this async approach. Having multiple work queues kind of busts the idea the socket services were created for (saving resources), so most of the handlers will likely end up on a system workqueue anyway. Plus, having the handlers to be called directly from the services thread allows to reuse the services stack and kind of finetune its size for the worst case scenario (I imagine all of the registered services would need to allocate buffers for socket send/recv calls and a common stack area seemed like a good place for this instead of using static/heap/custom stack memory).

But it's just my personal opinion, no strong push from my side to change it, if that's the preferred way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I prefer the async way of doing things although it certainly complicates things. But I can see that sync way is also possible here. I think we can support both methods quite easily and let the user select how the callback should be called. I will propose a new version that supports both methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - sync should be a relatively trivial wrapper around async ... or at least I thought it was... 🤔

@cfriedt
Copy link
Member

cfriedt commented Dec 21, 2023

Looks reasonable. It would be great to get a compile-time const number for network interfaces. Devicetree can fix that 😅

@jukkar
Copy link
Member Author

jukkar commented Dec 22, 2023

It would be great to get a compile-time const number for network interfaces. Devicetree can fix that

Indeed, that info would help a lot in various part of the network stack. Although this PR should not depend on number of network interfaces.

@jukkar
Copy link
Member Author

jukkar commented Dec 22, 2023

Updates:

  • removed malloc and allocated static array of pollfd in the size of CONFIG_NET_SOCKETS_POLL_MAX
  • tweak the tests so that they are not run in native_sim board (because eventfd does not work there)

@jukkar jukkar force-pushed the devel/socket-service branch from 358fc23 to e967a88 Compare December 22, 2023 10:52
@cfriedt
Copy link
Member

cfriedt commented Dec 23, 2023

  • tweak the tests so that they are not run in native_sim board (because eventfd does not work there)

Ugh.. that's annoying. I kept getting reassurances that posix arch and posix api were not mutually exclusive, but I often see that is still the case.

@jukkar jukkar force-pushed the devel/socket-service branch from e967a88 to 75147f3 Compare December 27, 2023 13:41
@jukkar
Copy link
Member Author

jukkar commented Dec 27, 2023

Ugh.. that's annoying. I kept getting reassurances that posix arch and posix api were not mutually exclusive, but I often see that is still the case.

Indeed, that is bad. If I try to run the PR sample or the tests in native_sim/posix boards, they just "hang" and do not proceed. Pressing ctrl-c does not terminate the zephyr.exe even though the control is returned to user and the process looks like it terminated.

@jukkar jukkar force-pushed the devel/socket-service branch from 75147f3 to 236e10d Compare January 3, 2024 12:18
@jukkar
Copy link
Member Author

jukkar commented Jan 3, 2024

  • Changed the sample to let CI pass.
  • Rebased on top of latest main

@jukkar jukkar marked this pull request as ready for review January 3, 2024 12:19
@jukkar
Copy link
Member Author

jukkar commented Jan 3, 2024

  • re-enabled native_sim after finding out why the test was hanging

@jukkar jukkar force-pushed the devel/socket-service branch from de3065a to 37ab5ef Compare January 4, 2024 09:25
kartben
kartben previously approved these changes Jan 15, 2024
Copy link
Collaborator

@kartben kartben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from docs standpoint, thanks!

samples/net/sockets/echo_service/prj.conf Show resolved Hide resolved
subsys/net/lib/sockets/sockets_service.c Outdated Show resolved Hide resolved
@jukkar jukkar dismissed stale reviews from kartben and rlubos via 543bf54 January 15, 2024 13:19
@jukkar jukkar force-pushed the devel/socket-service branch from 95884e4 to 543bf54 Compare January 15, 2024 13:19
@jukkar jukkar requested review from pdgendt, kartben and rlubos January 15, 2024 13:38
pdgendt
pdgendt previously approved these changes Jan 15, 2024
kartben
kartben previously approved these changes Jan 15, 2024
Copy link
Contributor

@rlubos rlubos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue spotted when testing.

subsys/net/lib/sockets/sockets_service.c Show resolved Hide resolved
The socket service provides a similar functionality as what
initd provides in Linux. It listens user registered sockets
for any activity and then launches a k_work for it. This way
each application does not need to create a thread to listen
a blocking socket.

Signed-off-by: Jukka Rissanen <[email protected]>
Simple tests that verify that the socket service API works
as expected.

Signed-off-by: Jukka Rissanen <[email protected]>
The echo-service sample demostrates how to use the socket
service API.

Signed-off-by: Jukka Rissanen <[email protected]>
The socket services users to "net sockets" command.

Signed-off-by: Jukka Rissanen <[email protected]>
If CONFIG_POSIX_API is enabled, then the socket.h is found under
zephyr/posix/sys/socket.h etc. This allows one to compile the
socket test applications without error prints.

Signed-off-by: Jukka Rissanen <[email protected]>
gcc prints this warning message

'strncat' specified bound 1 equals source length [-Wstringop-overflow=]
   58 |                 strncat(fd, "C", 1);

There was no error in the code but avoid the warning by not using
strncat().

Signed-off-by: Jukka Rissanen <[email protected]>
@jukkar jukkar dismissed stale reviews from kartben and pdgendt via c6684c5 January 15, 2024 16:55
@jukkar jukkar force-pushed the devel/socket-service branch from 543bf54 to c6684c5 Compare January 15, 2024 16:55
@jukkar
Copy link
Member Author

jukkar commented Jan 15, 2024

New version fixes the issue that Robert was seeing.

@jukkar jukkar requested review from rlubos, pdgendt and kartben January 15, 2024 16:56
Copy link
Contributor

@rlubos rlubos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@carlescufi carlescufi merged commit 6033161 into zephyrproject-rtos:main Jan 16, 2024
22 checks passed
@jukkar jukkar deleted the devel/socket-service branch January 16, 2024 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants