-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better handle new streams when server is quiescing #481
base: main
Are you sure you want to change the base?
Conversation
Motivation: When the server is in the locally quiescing state and a client attempts to open a new stream, the connection state machine treats receiving the headers as a connection error. This results in the connection being closed and in-flight requests being failed. This can happen because of the inherent race between the server sending a GOAWAY frame and the client receiving it, during which time the client can open a stream not knowing it's doomed to failure. Rather than the server treating this as a connection error, it should reject the stream with a RST_STREAM frame and emit a stream error. Modifications: - Emit a stream error when receiving HEADERS in the locally quiesced state Result: Better handling of new streams when the server is quiescing
Sources/NIOHTTP2/ConnectionStateMachine/ConnectionStreamsState.swift
Outdated
Show resolved
Hide resolved
Sources/NIOHTTP2/ConnectionStateMachine/ConnectionStreamsState.swift
Outdated
Show resolved
Hide resolved
underlyingError: NIOHTTP2Errors.noSuchStream(streamID: streamID), | ||
type: .protocolError | ||
) | ||
if isQuiescing { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this flag isn't quite specific enough. I'm not even entirely sure we want to sink this logic into this function: we really only want this flow to happen on receipt of stream creation frames, which might suggest a better name for the parameter at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I wasn't really happy with this whole approach. It's all a bit awkward.
The issue is that we receive HEADERS in a state where it's valid to receive them if the stream already exists but is a stream error if it doesn't. The problem with is that we return the stream error out of the state machine and back into the channel handler, which will then call back into the state machine to send the RST_STREAM frame as a result of the stream error. From the POV of the state machine, that stream doesn't exist, and that's the awkward bit.
It makes me wonder whether we should modify StateMachineResult
so that we can represent the notion of rejecting a stream (as opposed to a stream error, for which we assume the stream already exists). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's an interesting issue. Perhaps the actual fix is that we should tolerate sending RST_STREAM frames on streams that don't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially ruled that out because there are tests which validate that you can't reset a stream twice (or something similar). This would certainly be the simplest fix assuming we're okay with that small regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to that small regression, yeah.
Sources/NIOHTTP2/ConnectionStateMachine/ConnectionStreamsState.swift
Outdated
Show resolved
Hide resolved
underlyingError: NIOHTTP2Errors.noSuchStream(streamID: streamID), | ||
type: .protocolError | ||
) | ||
if isLocallyQuiescing { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so this is looking a lot better. I wonder now only if we need some amount of DoS protection. Presumably we shouldn't tolerate this happening too often. Can we record some counter or something that lets us produce a rate limit elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets ... interesting.
One approach is to re-use the rate limiter from DOSHeuristics
within the state machine and check the rate for this specific scenario. This would mean the state machine would need to understand time and would likely become generic over NIODeadlineClock
. I don't like that at all!
Another approach would be to widen the scope of the DOS scenario we're considering and apply rate limiting in the channel handler as we currently do. I think the appropriate thing to rate limit here is "inbound stream errors". WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this seems sensible. Let's have a rate limit on inbound stream errors. We can conceptually count this interval as a ratio on the number of frames received.
@@ -28,12 +28,15 @@ struct DOSHeuristics<DeadlineClock: NIODeadlineClock> { | |||
/// The maximum number of "empty" data frames we're willing to tolerate. | |||
private let maximumSequentialEmptyDataFrames: Int | |||
|
|||
private var resetFrameRateControlStateMachine: HTTP2ResetFrameRateControlStateMachine | |||
private var resetFrameRateControlStateMachine: RateLimitStateMachine | |||
private var streamErrorRateControlStateMachine: RateLimitStateMachine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the state machine reserves capacity on init
we end up regressing allocations per-connection here; given it's not per-stream I think that's okay.
Motivation:
When the server is in the locally quiescing state and a client attempts to open a new stream, the connection state machine treats receiving the headers as a connection error. This results in the connection being closed and in-flight requests being failed.
This can happen because of the inherent race between the server sending a GOAWAY frame and the client receiving it, during which time the client can open a stream not knowing it's doomed to failure.
Rather than the server treating this as a connection error, it should reject the stream with a RST_STREAM frame and emit a stream error.
Modifications:
Result:
Better handling of new streams when the server is quiescing