-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design: Reconsider the semantics of cancel.Wrap
#208
Comments
For context, the bridge completely ignores the SDK: // Cancel requests that the provider cancel all ongoing RPCs. For TF, this is a no-op.
func (p *Provider) Cancel(ctx context.Context, req *pbempty.Empty) (*pbempty.Empty, error) {
return &pbempty.Empty{}, nil
} PF: // SignalCancellation asks all resource providers to gracefully shut down and abort any ongoing operations. Operation
// aborted in this way will return an error (e.g., `Update` and `Create` will either a creation error or an
// initialization error. SignalCancellation is advisory and non-blocking; it is up to the host to decide how long to
// wait after SignalCancellation is called before (e.g.) hard-closing any gRPC connection.
func (p *provider) SignalCancellationWithContext(_ context.Context) error {
// Some improvements are possible here to gracefully shut down.
return nil
} |
The only point of implementing cancel would be to make some operation atomic with respect to cancellation to reduce the likelihood of exposing "partial states" exposed by interrupting such an operation in flight. Can we think of any? The stateful operation I'm thinking about is creating or updating a resource. It can be non-atomic if for example the provider creates a bucket first and then applies tags. Does this apply to the command provider, does it benefit from graceful cancel of a command? I'm apriori doubtful if any of the providers can meaningfully "gracefully cancel" these and would appreciate concrete evidence that it's helpful before throwing effort at implementing Cancel better at the generic level. |
I don't agree here. Implementing
I can't think of how either of these would apply to
Yes. The best way to illustrate this is to compare the behavior of The command provider benefits from handing
The provider shuts down cleanly and quickly. Pulumi feels responsive. When running against a build of
Pulumi feels less responsive because the first C-c was ignored. Because the resource can't communicate back any error, the user can't see that In addition, the I think the command provider experience is a pretty strong argument that providers need to exit when If a provider wants to customize the way it handles |
Pulumi experience is specifically designed so the first Ctrl+C is not killing instantly so that providers can gracefully terminate. The second Ctrl+C does kill immediately. I think I can find refs for this in the p/p interrupt handler somewhere. So for a provider, instead of ignoring Ctrl-C you can:
Either strategy feels like something with very few LOC and no concurrent code to worry about. To justify something sophisticated I'd love to see some worked out examples. I don't know of the bridge capability to gracefully cancel partial creates. I'm not sure what you mean by partial state in the engine, you mean "create failed" state? Any references there? |
That is my understanding. That means that the provider gets exactly one
Killing the process yourself doesn't help much, and Cancel can't block. Instead providers need to shut down gracefully as fast as possible (before a user sends the second Ctrl+C). Coincidentally, a motivating example came up during hack week: #210. |
Right, right I'm with you that killing the provider is not always great answer, but if you're saying cancel cannot block, what are you saying will happen in the service provider in that scenario? Are you saying that Cancel handler will issue cancellation over context.Context object which will make Team.Create abort work but respond to the engine with "InitFailed" so that can be saved in state? I mean this:
I haven't seen the code referenced that actually responds to the cancelled context, but skimming the docs just now it might be implicit in lower level networking code... So any kind of request handlers will stop waiting. So this makes sense if we have some form of error recovery, like saving errors into state with &rpc.ErrorResourceInitFailed (which I still would love a quick page on what that does precisely). If you want to speed up this error recovery this makes total sense. If it was just injecting unrecoverable errors it didn't make sense to me as it seemed to be swapping one way to crash for another way to crash. |
Yes! That's exactly what I'm saying.
Yes. Any high quality library with ongoing operations should cancel immediately. This includes go's standard library, for both network requests and subprocesses. Most providers (pulumi-command and pulumi-pulumiservice included) will get instant cancelation for free. |
This hasn't come up again, and I think the design is pretty solid. I definitely think that reducing the scope of the cancel wrapper would hurt the default experience. Until and unless the Pulumi provider protocol allows providers to block a cancel operation, we need to default to the safest |
Hello!
Issue details
Following up on the discussion from #207, it seems like the as-designed behavior of the
cancel
middleware is unintuitive/unappealing to some.The current behavior as described by
cancel
's doc comment (PR pending) is:This issue is a great place to comment on why the existing design should change.
Affected area/feature
The text was updated successfully, but these errors were encountered: