Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Timeout Mechanism for Replica Termination #4135

Open
andylizf opened this issue Oct 22, 2024 · 0 comments
Open

Implement Timeout Mechanism for Replica Termination #4135

andylizf opened this issue Oct 22, 2024 · 0 comments

Comments

@andylizf
Copy link
Contributor

andylizf commented Oct 22, 2024

Currently, replicas can get stuck during shutdown, requiring manual intervention into code logic. This is very unfriendly to users.

We propose implement a timeout mechanism for replica termination. Replicas that do not shut down within the specified timeout should transition to a FAILED_SHUTDOWN state. These failed replicas can then be terminated using the existing terminate_replica command (#4032).

          Instead of adding a `force` terminating option, we decided to implement a timeout mechanism for terminating replicas so that users can tear down those exceed the timeout and end up in `FAILED_SHUTDOWN` status via previously mentioned `terminate_replica` introduced by #4032.

Originally posted by @andylizf in #4059 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant