-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a diagnostic refresh_age
metric and corresponding container healthcheck
#50
Conversation
Hi @michaelosthege, thanks for the contribution. This looks good besides the one comment I had. Ultimately, instead of needing such a healthcheck long-term, I'd prefer finding and preventing an issue which would block the loop infinitely. But at least this helps finding such occasions. |
Totally agree. I tried to debug, but found it quite hard.
As a start I would recommend to apply |
Feel free of you find the time for it. The exporter has grown over time and could need some rework. |
@michaelosthege Can you rebase on the latest state of master? I recently refactored the workflow definitions and the image build job fails because of that mismatch. |
Done! |
Workflow security setting prevent running from a remote repo, so I cloned it into a local branch on PR #58. |
healthcheck.sh
Outdated
maxage=${1:-60} | ||
|
||
# Get the age of the last successful metrics refresh in seconds | ||
age=$(curl -s http://localhost:9040/metrics | grep 'homematic_refresh_age{' | cut -d ' ' -f2 | cut -d '.' -f1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no curl in the docker image yet - and the scripts lacks an execute bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, I should have checked the healthcheck logs!
I didn't need to change the file permissions though. It's now doing 0-exit on my deployment..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your healthcheck command included the bash-invocation and only gave the script as argument, that is why you didn't nee the bit. But it would make sense to do so to be able to call it natively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please feel free to suggest a change or make a commit. I don't know what exactly you'd like to see here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 5f580ed, it is just about the permissions bits on the healthcheck.sh. I can add that later myself as well, I cannot push to your branch.
The most recent docker build contains that already and looks good so far in my cluster. I did not set a liveness check using the healthcheck yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Linux file permissions at the git level? I didn't know these were copied into the docker image.
Hi @sfudeus, thanks for this great exporter!
Every now and then my container appears to hang, producing such gaps until I restart it:
I assume that some internal loop crashes, but the logs don't indicate anything.
To notice when things to out of order, I introduced a new metric:
In combination with a healthcheck script and docker-autoheal this makes it easy to automatically restart the container.