Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose instance auto-restart status in the console #2469

Open
hawkw opened this issue Sep 24, 2024 · 0 comments
Open

Expose instance auto-restart status in the console #2469

hawkw opened this issue Sep 24, 2024 · 0 comments

Comments

@hawkw
Copy link
Member

hawkw commented Sep 24, 2024

PR oxidecomputer/omicron#6503 implemented automatic restarts of instances in the Failed state. This change introduced some additional instance state that should be exposed to users. In particular:

  • When a Failed instance is automatically restarted, a cooldown timer is started for that instance. If that instance fails again while the cooldown period is still active, it will not be automatically restarted again until the cooldown period has elapsed.
  • Some instances may be configured with auto-restart policies that do not permit them to be restarted when they are Failed.

New fields were added to the external-API instance message to report state related to automatic restarts. Instances now have an auto_restart_enabled: boolean field that indicates if their auto-restart policy permits restarting the instance, and an auto_restart_cooldown_expiration: string representing the date and time at which the cooldown period will have completed (allowing the instance to be restarted again). See: https://github.com/oxidecomputer/omicron/blob/45813be40b62167eff75333c410515e8bee24211/openapi/nexus.json#L15094-L15104

This data should probably be exposed to users: if an instance is in the Failed state, the user will want to know why it has not yet been automatically restarted, whether it will ever be automatically restarted, and if it will, when that will happen. We probably only need to display this information for instances which are Failed. If a Failed instance has auto_restart_enabled set to false, we should tell the user that auto-restart is disabled for that instance. Otherwise, if there is an auto_restart_cooldown_expiration timestamp, we should tell the user that the instance will be restarted only after that time. If auto_restart_enabled is not false and there is no auto_restart_cooldown_expiration timestamp, then the instance will be automatically restarted --- we might want to indicate that as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant