Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handle new streams when server is quiescing #481

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

glbrntt
Copy link
Contributor

@glbrntt glbrntt commented Nov 6, 2024

Motivation:

When the server is in the locally quiescing state and a client attempts to open a new stream, the connection state machine treats receiving the headers as a connection error. This results in the connection being closed and in-flight requests being failed.

This can happen because of the inherent race between the server sending a GOAWAY frame and the client receiving it, during which time the client can open a stream not knowing it's doomed to failure.

Rather than the server treating this as a connection error, it should reject the stream with a RST_STREAM frame and emit a stream error.

Modifications:

  • Emit a stream error when receiving HEADERS in the locally quiesced state

Result:

Better handling of new streams when the server is quiescing

Motivation:

When the server is in the locally quiescing state and a client attempts
to open a new stream, the connection state machine treats receiving the
headers as a connection error. This results in the connection being
closed and in-flight requests being failed.

This can happen because of the inherent race between the server sending
a GOAWAY frame and the client receiving it, during which time the client
can open a stream not knowing it's doomed to failure.

Rather than the server treating this as a connection error, it should
reject the stream with a RST_STREAM frame and emit a stream error.

Modifications:

- Emit a stream error when receiving HEADERS in the locally quiesced
  state

Result:

Better handling of new streams when the server is quiescing
@glbrntt glbrntt added the 🔨 semver/patch No public API change. label Nov 6, 2024
underlyingError: NIOHTTP2Errors.noSuchStream(streamID: streamID),
type: .protocolError
)
if isQuiescing {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this flag isn't quite specific enough. I'm not even entirely sure we want to sink this logic into this function: we really only want this flow to happen on receipt of stream creation frames, which might suggest a better name for the parameter at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I wasn't really happy with this whole approach. It's all a bit awkward.

The issue is that we receive HEADERS in a state where it's valid to receive them if the stream already exists but is a stream error if it doesn't. The problem with is that we return the stream error out of the state machine and back into the channel handler, which will then call back into the state machine to send the RST_STREAM frame as a result of the stream error. From the POV of the state machine, that stream doesn't exist, and that's the awkward bit.

It makes me wonder whether we should modify StateMachineResult so that we can represent the notion of rejecting a stream (as opposed to a stream error, for which we assume the stream already exists). WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's an interesting issue. Perhaps the actual fix is that we should tolerate sending RST_STREAM frames on streams that don't exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially ruled that out because there are tests which validate that you can't reset a stream twice (or something similar). This would certainly be the simplest fix assuming we're okay with that small regression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to that small regression, yeah.

@glbrntt glbrntt requested a review from Lukasa November 11, 2024 10:40
underlyingError: NIOHTTP2Errors.noSuchStream(streamID: streamID),
type: .protocolError
)
if isLocallyQuiescing {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so this is looking a lot better. I wonder now only if we need some amount of DoS protection. Presumably we shouldn't tolerate this happening too often. Can we record some counter or something that lets us produce a rate limit elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets ... interesting.

One approach is to re-use the rate limiter from DOSHeuristics within the state machine and check the rate for this specific scenario. This would mean the state machine would need to understand time and would likely become generic over NIODeadlineClock. I don't like that at all!

Another approach would be to widen the scope of the DOS scenario we're considering and apply rate limiting in the channel handler as we currently do. I think the appropriate thing to rate limit here is "inbound stream errors". WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this seems sensible. Let's have a rate limit on inbound stream errors. We can conceptually count this interval as a ratio on the number of frames received.

@glbrntt glbrntt added 🆕 semver/minor Adds new public API. and removed 🔨 semver/patch No public API change. labels Jan 6, 2025
@@ -28,12 +28,15 @@ struct DOSHeuristics<DeadlineClock: NIODeadlineClock> {
/// The maximum number of "empty" data frames we're willing to tolerate.
private let maximumSequentialEmptyDataFrames: Int

private var resetFrameRateControlStateMachine: HTTP2ResetFrameRateControlStateMachine
private var resetFrameRateControlStateMachine: RateLimitStateMachine
private var streamErrorRateControlStateMachine: RateLimitStateMachine
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the state machine reserves capacity on init we end up regressing allocations per-connection here; given it's not per-stream I think that's okay.

@glbrntt glbrntt requested a review from Lukasa January 6, 2025 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🆕 semver/minor Adds new public API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Treat receiving HEADERS for a new stream after sending a GOAWAY as a stream error
2 participants