-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cancelation panic #207
Conversation
The new implementation is simpler to understand and consequentially less racy.
I'm still having trouble understanding why we need concurrency management here. Each request is handled in its own goroutine with its own context, so what is there to coordinate? I understand why we need some middleware to translate
It seems like this is primarily why we're trying to coordinate things across requests? If I understand, we get this Re: behavior, I don't think the |
That sits at the root of the server stack, and is unrelated to this middleware block. #159 is still the right thing to do, but it doesn't factor in here.
Yes. That is the only reason we need to deal with concurrency here. Calling The documentation for // Cancel signals the provider to gracefully shut down and abort any ongoing resource operations.
// Operations aborted in this way will return an error (e.g., `Update` and `Create` will either return a
// creation error or an initialization error). Since Cancel is advisory and non-blocking, it is up
// to the host to decide how long to wait after Cancel is called before (e.g.)
// hard-closing any gRPC connection.
rpc Cancel(google.protobuf.Empty) returns (google.protobuf.Empty) {} To do this, we need a middle ground between
The first ctrl-c sends the |
I tried reading this but I'm also not sure what behavior we want and would like to understand that before getting down to how it's implemented. If my review's needed here would appreciate a quick sync to figure this out. We have a slightly related grey are for me where bridged providers try to handle graceful termination, and this would be super helpful to clarify what is the desired behavior on handling graceful provider termination, and what is the idiomatic Go expression of that. Naively I'm expecting something along the lines of the GRPC server handling everything necessary for Contexts that handlers receive, but then us needing to tie a request for graceful termination from the Engine to some for of message to the gRPC server that would instruct it to cancel out all existing context objects and wait for the requests to finish processing. I'd expect this all to be available at this layer. Not at the logical layer as here. |
// The `cancel` package provides a middle-ware that ties the Cancel gRPC call from Pulumi | ||
// to Go's `context.Context` cancellation system. | ||
// | ||
// Wrapping a provider in `cancel.Wrap` ensures 2 things: | ||
// | ||
// 1. When a resource operation times out, the associated context is canceled. | ||
// | ||
// 2. When `Cancel` is called, all outstanding gRPC methods have their associated contexts | ||
// canceled. | ||
// | ||
// A `cancel.Wrap`ed provider will still call the `Cancel` method on the underlying | ||
// provider. If NotImplemented is returned, it will be swallowed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some docs clarifying what cancel
currently does. This behavior is unchanged before and after this PR. The PR only prevents panics on concurrent creations and cancelations.
Since this PR fixes a P1 (panic) and doesn't change the design of cancel.Wrap
, I suggest that we merge as is unless there are suggestions on how to better implement the existing semantic.
I can tell that there is disagreement on the desired behavior of cancel.Wrap
. I'm happy to have that discussion, but I think it should be a separate discussion, outside the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #208 to track a design discussion on what cancel
should do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me!
Gotcha -- this is the crux of it. You want to cancel the requests but still let them log or return to the engine.
Interesting, I guess I interpreted this differently. The signal/advisory parts makes it seem like more of a courtesy call, and the "Operations aborted in this way..." explains how the engine will interpret operations aborted by the provider (if it decides to abort anything). In any case this is probably one of those under-specified corners of behavior. I'm curious what we do in the bridge -- if this is already the norm then I don't have any concerns.
Same. This feels like it could be handled by an interceptor which the server can broadcast a shutting-down signal to. |
This is accomplished by replacing a linear search of `c.entries` for empty cells with a maintained list of empty cells.
Fixes #186
This PR is best reviewed by commit:
cancel.Wrap
method.inOutCache
withevict.Pool
: A data structure better suited for purpose.cancel
usesevict.Pool
to keep track of the set of current requests, ensuring that the request associated with each context is canceled if one of three events happen:Create
,Update
andDelete
).Cancel
request.cancel
middleware, especially under heavy concurrency. I'm pleased to report that 6d6b8aa without 997d21b regularly panics, showing that the improved testing does cover Cancellation panic #186.-race
flag for tests. This would have also caught Cancellation panic #186.