Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should be in error state if snap service down and uninstallation failed #108

Closed
Pjack opened this issue Sep 20, 2024 · 2 comments · Fixed by #110
Closed

Should be in error state if snap service down and uninstallation failed #108

Pjack opened this issue Sep 20, 2024 · 2 comments · Fixed by #110

Comments

@Pjack
Copy link

Pjack commented Sep 20, 2024

Address the issue mentioned in

@Pjack Pjack changed the title Should be in error state instead of blocked if snap service is down Should be in error state if snap service is down Sep 20, 2024
@Pjack Pjack changed the title Should be in error state if snap service is down Should be in error state if snap service down and uninstallation failed Sep 20, 2024
@Pjack
Copy link
Author

Pjack commented Sep 23, 2024

After discussing with @jdkandersson, I believe we can adopt this practice in the GitHub runner.
We don't set the charm to ErrorStatus; instead, we allow the exceptions to bubble up and be handled by the Ops framework.
https://github.com/canonical/github-runner-operator/blob/main/src/charm.py#L454

That's to say. If an unexpected error occurs in the workload (e.g., installation failure, service downtime), raise an exception in the charm if no specific action can be provided to resolve the issue. After the operator reviews the workload logs, they can manually resolve Juju.

What do you think ?

@jneo8
Copy link
Contributor

jneo8 commented Sep 23, 2024

+1 for error status. It's a diverge situation so we can't provide a document or instruction to the user.

samuelallan72 added a commit that referenced this issue Sep 24, 2024
For:
- snap install or refresh failure
- snap not installed
- snap service not active

These cases are unexpected, the charm cannot recover automatically,
and the user must manually debug the issue.
Therefore, error status is more appropriate.

Partially fixes: #108
samuelallan72 added a commit that referenced this issue Sep 30, 2024
Raise errors for snap install or refresh failure.  Raise an error because it's a fatal failure, and juju will auto-retry the hook, which may succeed on retry if the error was transient.

Improve blocked status message for if it detects snap not installed or service not active in the status hook.
I think Blocked is better than Error status here:
- auto retrying the status hook isn't going to fix anything
- blocked status lets the charm provide a more useful status message to the user
- blocked status will automatically update to active on the next update-status hook once the issue is resolved


Partially fixes: #108
@samuelallan72 samuelallan72 reopened this Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants