-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce socket service API #66758
Introduce socket service API #66758
Conversation
This is still a draft, we can change the APIs etc, or discard the PR if it is not needed. Anyway, please take a look. |
* User should create needed sockets and then setup the poll struct and | ||
* then register the sockets to be monitored at runtime. | ||
*/ | ||
struct net_socket_service_desc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we really need this static configuration for socket service? It seems a bit problematic, at least for my use case - DHCP server. Here, ideally, I'd need to have a single socket for each interface that wants to support the DHCP server functionality. But I don't think we have a way to calculate the number of interfaces in the system at build time to supply the static configuration macros. Purely runtime registration would be less cumbersome in such case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we really need this static configuration for socket service? It seems a bit problematic, at least for my use case - DHCP server. Here, ideally, I'd need to have a single socket for each interface that wants to support the DHCP server functionality. But I don't think we have a way to calculate the number of interfaces in the system at build time to supply the static configuration macros. Purely runtime registration would be less cumbersome in such case.
I originally did some experimentation with runtime configuration, but as the API requires some housekeeping activities, allocating them at runtime requires malloc etc. There is a call to malloc at this PR too but hopefully we could get rid of it.
We could place struct zsock_pollfd
into DHCP config in network interface, that way the information this API needs, would already be part of network interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Devicetree could help here, just sayin 🤷🏼♂️
return -ENOENT; | ||
} | ||
|
||
if (svc->work_q != NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we call the registred callback directly from the services thread? I see a potential race here - we do submit the work to execute, but don't read the data (in case POLLIN was monitored) from the socket until the work executes. If the workqueue thread has lower prio than the services thread, we may end up busy looping (poll() will keep reporting POLLIN as the work item didn't have a chance to execute).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we call the registred callback directly from the services thread?
I originally planned to do that, but it felt more natural to use the existing APIs and resources in the system. So using a work queue does the same thing. But we can certainly change this, it is just to change one function call.
When using workqueue(s), we can serve (at least theoretically) multiple socket services at the "same" time and the priority of the thread using the services api determines which one gets called first. If we have a callback directly from services thread, then there is no such thing available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a potential race here - we do submit the work to execute, but don't read the data (in case POLLIN was monitored) from the socket until the work executes.
Note that k_work_submit()
does nothing if the work is already pending. It is indeed possible that we might busyloop in the service thread if the service user does not "run" fast enough. I tried to mitigate this problem by setting the service thread priority to lowest application thread priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to an issue when calling the "callback" via the workqueue as it might be possible (depending on different thread priorities) that we start to loop through poll() if the callback does not get run properly. So it might be better if we call the callback directly (synchronously) without using the workqueue. I will investigate this more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to an issue when calling the "callback" via the workqueue as it might be possible (depending on different thread priorities) that we start to loop through poll() if the callback does not get run properly. So it might be better if we call the callback directly (synchronously) without using the workqueue. I will investigate this more.
Left the async call but fixed the code so that the callback work is only called once. Fixed also the tests that were not working properly with native_sim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I'm not personally convinced about this async approach. Having multiple work queues kind of busts the idea the socket services were created for (saving resources), so most of the handlers will likely end up on a system workqueue anyway. Plus, having the handlers to be called directly from the services thread allows to reuse the services stack and kind of finetune its size for the worst case scenario (I imagine all of the registered services would need to allocate buffers for socket send/recv calls and a common stack area seemed like a good place for this instead of using static/heap/custom stack memory).
But it's just my personal opinion, no strong push from my side to change it, if that's the preferred way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I prefer the async way of doing things although it certainly complicates things. But I can see that sync way is also possible here. I think we can support both methods quite easily and let the user select how the callback should be called. I will propose a new version that supports both methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - sync should be a relatively trivial wrapper around async ... or at least I thought it was... 🤔
Looks reasonable. It would be great to get a compile-time const number for network interfaces. Devicetree can fix that 😅 |
Indeed, that info would help a lot in various part of the network stack. Although this PR should not depend on number of network interfaces. |
daf4a3d
to
358fc23
Compare
Updates:
|
358fc23
to
e967a88
Compare
Ugh.. that's annoying. I kept getting reassurances that posix arch and posix api were not mutually exclusive, but I often see that is still the case. |
e967a88
to
75147f3
Compare
Indeed, that is bad. If I try to run the PR sample or the tests in native_sim/posix boards, they just "hang" and do not proceed. Pressing ctrl-c does not terminate the zephyr.exe even though the control is returned to user and the process looks like it terminated. |
75147f3
to
236e10d
Compare
|
|
de3065a
to
37ab5ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from docs standpoint, thanks!
95884e4
to
543bf54
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue spotted when testing.
The socket service provides a similar functionality as what initd provides in Linux. It listens user registered sockets for any activity and then launches a k_work for it. This way each application does not need to create a thread to listen a blocking socket. Signed-off-by: Jukka Rissanen <[email protected]>
Simple tests that verify that the socket service API works as expected. Signed-off-by: Jukka Rissanen <[email protected]>
The echo-service sample demostrates how to use the socket service API. Signed-off-by: Jukka Rissanen <[email protected]>
The socket services users to "net sockets" command. Signed-off-by: Jukka Rissanen <[email protected]>
If CONFIG_POSIX_API is enabled, then the socket.h is found under zephyr/posix/sys/socket.h etc. This allows one to compile the socket test applications without error prints. Signed-off-by: Jukka Rissanen <[email protected]>
gcc prints this warning message 'strncat' specified bound 1 equals source length [-Wstringop-overflow=] 58 | strncat(fd, "C", 1); There was no error in the code but avoid the warning by not using strncat(). Signed-off-by: Jukka Rissanen <[email protected]>
543bf54
to
c6684c5
Compare
New version fixes the issue that Robert was seeing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
This PR introduces a socket service API. It can be used to create one socket listener that calls a callback if a set of sockets have any activity. This saves memory because instead of creating multiple threads for a set of sockets, one main thread can listen all needed sockets. This is similar concept how inetd works in Linux, although it is implemented differently.