Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🪵Log TLS shutdown error🛑, only if any #10259

Open
Al2Klimov opened this issue Nov 27, 2024 · 3 comments
Open

🪵Log TLS shutdown error🛑, only if any #10259

Al2Klimov opened this issue Nov 27, 2024 · 3 comments
Labels
area/distributed Distributed monitoring (master, satellites, clients) area/log Logging related core/quality Improve code, libraries, algorithms, inline docs

Comments

@Al2Klimov
Copy link
Member

Describe the bug

Currently, we call boost::asio::ssl::stream::async_shutdown() and ignore the error. We shall log it!

But only if there's an actual error, i.e not a false positive which unnecessarily scares the user. Currently we even have such:

To Reproduce

--- lib/remote/jsonrpcconnection.cpp
+++ lib/remote/jsonrpcconnection.cpp
@@ -253 +253,7 @@ void JsonRpcConnection::Disconnect()
-                       m_Stream->next_layer().async_shutdown(yc[ec]);
+                       {
+                               boost::system::error_code ec;
+                               m_Stream->next_layer().async_shutdown(yc[ec]);
+
+                               Log(LogCritical, "JsonRpcConnection")
+                                       << "API client disconnected for identity '" << m_Identity << "': " << ec.message();
+                       }

With this patch, running (echo '1000000:'; cat /dev/zero) |openssl s_client -connect 127.0.0.1:5665 against Icinga 2 results in:

[2024-11-26 18:04:16 +0100] critical/JsonRpcConnection: API client disconnected for identity '': application data after close notify (SSL routines)

This is due to how Boost ASIO shuts down SSL – it (often) calls SSL_shutdown() twice. Once for notifying the peer and the second time to mark the connection as shut down. The peer may even not have time to ack the termination. And when it does, we're passively aggressive via the above log message.

Expected behavior

  1. Call SSL_shutdown() once
  2. Read, via Boost ASIO, until EOF (replaces second SSL_shutdown() call)
  3. Log the damn error if still any :)

Screenshots

Your Environment

Include as many relevant details about the environment you experienced the problem in

  • Version used (icinga2 --version): master
  • Operating System and version: latest MacOS
  • Enabled features (icinga2 feature list): api
  • Icinga Web 2 version and modules (System - About): -
  • Config validation (icinga2 daemon -C):
  • If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes.

Additional context

Digged out by @julianbrost IIRC.

@Al2Klimov Al2Klimov added area/distributed Distributed monitoring (master, satellites, clients) area/log Logging related core/quality Improve code, libraries, algorithms, inline docs labels Nov 27, 2024
@Al2Klimov
Copy link
Member Author

Expected behavior

  1. Call SSL_shutdown() once
  2. Read, via Boost ASIO, until EOF (replaces second SSL_shutdown() call)

💡 We already override async_handshake(), so we can do with async_shutdown().

@yhabteab yhabteab changed the title 🪵Log TLS shutdown error🛑, only if any Log TLS shutdown error, only if any Nov 27, 2024
@julianbrost
Copy link
Contributor

Expected behavior

  1. Call SSL_shutdown() once
  2. Read, via Boost ASIO, until EOF (replaces second SSL_shutdown() call)

💡 We already override async_handshake(), so we can do with async_shutdown().

That sounds like you also want to read inside async_shutdown()? How would your async_shutdown() then be different from Asio's async_shutdown()?

Additional context

Digged out by @julianbrost IIRC.

Yes, but the additional context was that I started to implement a clean shutdown of our JSON-RPC connections. Clean shutdown meaning that one node initiates the shutdown, closing only it's write direction and then still processing incoming messages from the peer until it has noticed that the connection is being closed1 and closes it's write direction in response2.

So this would conflict doing any read in that TLS shutdown method as the reads still have to be handled by the normal JSON-RPC message processing until that reads EOF.

Footnotes

  1. Remember network latency? When one side closes the connection, messages sent by the peer may already be on the wire.

  2. Unless a timeout is reached, in that case, in that case the connection will be forcefully terminated, otherwise it could block for very long if the connection died at the wrong time.

@Al2Klimov Al2Klimov changed the title Log TLS shutdown error, only if any 🪵Log TLS shutdown error🛑, only if any Nov 27, 2024
@Al2Klimov
Copy link
Member Author

💡 We already override async_handshake(), so we can do with async_shutdown().

That sounds like you also want to read inside async_shutdown()? How would your async_shutdown() then be different from Asio's async_shutdown()?

Well, I'd call SSL_shutdown() exactly once. ASIO calls it twice, that triggers "application data after close notify".

Additional context

Digged out by @julianbrost IIRC.

Yes, but the additional context was that I started to implement a clean shutdown of our JSON-RPC connections. Clean shutdown meaning that one node initiates the shutdown, closing only it's write direction and then still processing incoming messages from the peer until it has noticed that the connection is being closed1 and closes it's write direction in response2.

So this would conflict doing any read in that TLS shutdown method as the reads still have to be handled by the normal JSON-RPC message processing until that reads EOF.

OK, this indeed adds a lot of context to what you told me before. I thought we can:

  1. Stop I/O loops
  2. Call SSL_shutdown() once
  3. Read and discard* until EOF/timeout

*) Apparently we don't care for any incoming messages on shutdown: #10213

Given your solution, I guess we'd have to:

  1. Await/stop write loop
  2. Call SSL_shutdown() once
  3. Read and process/discard until EOF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/distributed Distributed monitoring (master, satellites, clients) area/log Logging related core/quality Improve code, libraries, algorithms, inline docs
Projects
None yet
Development

No branches or pull requests

2 participants