Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: http: client: Return error if waiting timeout #68930

Merged

Conversation

jukkar
Copy link
Member

@jukkar jukkar commented Feb 13, 2024

Return error to the caller if no data was received or there was some other error. Earlier we did not check the error condition properly.

rlubos
rlubos previously approved these changes Feb 13, 2024
Copy link
Contributor

@rlubos rlubos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, although I've left one suggestion to consider. No strong push though.

subsys/net/lib/http/http_client.c Outdated Show resolved Hide resolved
Return error to the caller if no data was received or there
was some other error. Earlier we did not check the error
condition properly.

Signed-off-by: Jukka Rissanen <[email protected]>
@jukkar jukkar force-pushed the fix/http-client-no-data-received branch from a2ce6f9 to ddab14c Compare February 14, 2024 13:24
@jukkar jukkar requested a review from pdgendt February 28, 2024 11:55
@fabiobaltieri fabiobaltieri merged commit 28a46c0 into zephyrproject-rtos:main Feb 28, 2024
17 checks passed
@jukkar jukkar deleted the fix/http-client-no-data-received branch February 29, 2024 08:08
@glarsennordic
Copy link
Contributor

glarsennordic commented Mar 26, 2024

Hey, sorry I am commenting after this was merged, but wasn't the lack of error output in this scenario intentional? (or, more specifically, "officially adopted" and documented?)

NULL HTTP output was instead indicated with the special status 0:

/** Numeric HTTP status code which corresponds to the
 * textual description. Set to zero if null response is
 * given. Otherwise, will be a 3-digit integer code if
 * valid HTTP response is given.
 */
uint16_t http_status_code;

See
d31d3bf for all the other documented behaviors of http-client in the event that the server gives a NULL response (but gracefully closes the TLS socket)

@rlubos @jukkar

Are we intentionally reversing that decision? In which case documentation from my PR needs to be updated as well, and we will need to refactor libraries relying on the old behavior. That's fine if that's how it needs to be, but it does seem worth it to bring it up and discuss first

See the issue I raised about this topic for discussion of whether NULL HTTP output should be considered a connection error:

#42988

NULL response is absolutely an edge case, but there are servers out there that respond to legitimate API calls with a graceful TLS termination and NULL response body. So we'll have to refactor those in downstream projects if this change is permanent.

@rlubos
Copy link
Contributor

rlubos commented Mar 27, 2024

@jukkar Were there any other edge cases, other than connection close, covered by this change? If not, perhaps we should indeed revert this change, and just document it better, that in case of no response, it'd be notified in the registered callback with the special status code (0)?

@jukkar
Copy link
Member Author

jukkar commented Mar 27, 2024

I noticed the issue while working on #63531, I can check what it means to revert this.

@jukkar
Copy link
Member Author

jukkar commented Mar 27, 2024

This commit fixes this issue:

The original behavior without this commit gives following output:

[00:00:21.226,000] <dbg> online_checker: do_online_http_check: (conn_mgr_monitor): Sending HTTP GET request to 192.168.88.15:4242 (ifindex 1)
[00:00:21.241,000] <dbg> net_http: http_flush_data: Data to send
                                   47 45 54 20 68 74 74 70  3a 2f 2f 31 39 32 2e 31 |GET http ://192.1
                                   36 38 2e 38 38 2e 31 35  3a 34 32 34 32 20 48 54 |68.88.15 :4242 HT
                                   54 50 2f 31 2e 31 0d 0a  48 6f 73 74 3a 20 31 39 |TP/1.1.. Host: 19
                                   32 2e 31 36 38 2e 38 38  2e 31 35 3a 34 32 34 32 |2.168.88 .15:4242
                                   0d 0a 0d 0a                                      |....             
[00:00:21.292,000] <dbg> net_http: http_client_req: (conn_mgr_monitor): Sent 68 bytes
[00:00:23.301,000] <dbg> net_http: http_wait_data: Timeout
[00:00:23.309,000] <dbg> net_http: http_data_final_null_resp: (conn_mgr_monitor): Calling callback for Final Data(NULL HTTP response)
[00:00:23.323,000] <dbg> online_checker: response_cb: (conn_mgr_monitor): All the data received (0 bytes)
[00:00:23.335,000] <inf> online_checker: Response status 
[00:00:23.343,000] <dbg> net_http: http_client_req: (conn_mgr_monitor): Received 0 bytes
[00:00:23.354,000] <dbg> online_checker: exec_http_query: (conn_mgr_monitor): Received HTTP status 0, not considering online.
[00:00:23.368,000] <dbg> net_sock: z_impl_zsock_close: (conn_mgr_monitor): close: ctx=0x20022d98, fd=0

and then with the fix

[00:00:03.318,000] <dbg> online_checker: do_online_http_check: (conn_mgr_monitor): Sending HTTP GET request to 192.168.88.15:4242 (ifindex 1)
[00:00:03.334,000] <dbg> net_http: http_flush_data: Data to send
                                   47 45 54 20 68 74 74 70  3a 2f 2f 31 39 32 2e 31 |GET http ://192.1
                                   36 38 2e 38 38 2e 31 35  3a 34 32 34 32 20 48 54 |68.88.15 :4242 HT
                                   54 50 2f 31 2e 31 0d 0a  48 6f 73 74 3a 20 31 39 |TP/1.1.. Host: 19
                                   32 2e 31 36 38 2e 38 38  2e 31 35 3a 34 32 34 32 |2.168.88 .15:4242
                                   0d 0a 0d 0a                                      |....             
[00:00:03.388,000] <dbg> net_http: http_client_req: (conn_mgr_monitor): Sent 68 bytes
[00:00:05.398,000] <dbg> net_http: http_wait_data: Timeout
[00:00:05.406,000] <dbg> net_http: http_data_final_null_resp: (conn_mgr_monitor): Calling callback for Final Data(NULL HTTP response)
[00:00:05.421,000] <dbg> online_checker: response_cb: (conn_mgr_monitor): All the data received (0 bytes)
[00:00:05.433,000] <inf> online_checker: Response status 
[00:00:05.441,000] <dbg> net_http: http_client_req: (conn_mgr_monitor): Timeout while waiting data
[00:00:05.452,000] <dbg> online_checker: exec_http_query: (conn_mgr_monitor): HTTP request failed (-116)
[00:00:05.464,000] <dbg> net_sock: z_impl_zsock_close: (conn_mgr_monitor): close: ctx=0x20022d98, fd=0

So the difference is that with original code and when a HTTP reply is not received, we get HTTP status 0 which just looks plain wrong as there was no HTTP server at the other end.
With the fix, we get ETIMEDOUT error which IMHO makes sense in this case.

WDYT?

@glarsennordic
Copy link
Contributor

glarsennordic commented Mar 27, 2024

I think we are dealing with two separate behaviors, and we should introduce some more precise terminology to address that better:

NULL response is specifically the situation where the server sends a FIN ACK without transmitting any BODY data. So I'll refer to that from now on as NULL response (finack with no body)

Whereas what it seems you were trying to fix was the situation where no response (not even FIN ACK) is received from the server at all. Let's call this second situation unresponsive server.

So I agree, in the event of unresponsive server, it makes sense to raise an error.

However, In the event of a NULL response (finack with no body), I think we should keep the old behavior, and document it better as Robert says.

Your PR happens to affect the behavior of BOTH unresponsive server AND NULL response (finack with no body), and that is where the problem lies, IMO.

IMO the behavior you are seeing in your examples is odd. Wouldn't unresponsive server result in total_recv < 0? I would only expect total_recv==0 in the case specifically of NULL response (finack with no body). Admittedly it has been a long time since I visited this code, though, so please correct me if I'm just straight up wrong here! Are you seeing this behavior in your automated tests? What is the nature of the server you are contacting?

If I am remembering wrong, then IMO what is needed is a change in http_client so that it can successfully distinguish between unresponsive server and NULL response. At a minimum it is possible to distinguish between a data timeout and a finack at the TLS socket level, so it should be possible for http_client to also distinguish this.

@jukkar
Copy link
Member Author

jukkar commented Mar 28, 2024

The code checks if there was any data received from server total_recv == 0, which means there was a timeout regardless of what happened in the server side. So what you are describing are same thing from client point of view.
How do you propose to detect this NULL response in client side?

@jukkar
Copy link
Member Author

jukkar commented Mar 28, 2024

Wouldn't unresponsive server result in total_recv < 0?

No, an error like ECONNREFUSED would only be returned if there is no port listener. If there is but it refuses to talk, then everything is ok from client point of view. Only if there is no data returned, we get a timeout error.

@jukkar
Copy link
Member Author

jukkar commented Mar 28, 2024

@glarsennordic I think I found a way to detect what you asked. See #70816 for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants