secrets: Improve debuggability & reliability of misconfigured *Monitoring CRs with secrets. #917

bwplotka · 2024-03-27T09:28:51Z

(This relates to unreleased feature from #776 PR)

When the secret is configured in e.g. PodMonitoring but not found by the Prometheus we get nice Target Page error:

Hopefully this works with Target Status feature too. I think it does not fail the Prometheus config apply, but didn't check.

However, when user forgets to add permissions for the existing, well-referenced secret, the Prometheus scrape config parsing (and reloading) fails, we get cryptic unknown error and status page shows 401 unauthorized.

Full log:

{"caller":"main.go:1326","err":"unable to watch secret default/go-synthetic-basic-auth: unknown (get secrets)","level":"error","msg":"Failed to apply configuration","ts":"2024-03-26T21:24:20.265Z"}
{"caller":"main.go:1043","err":"one or more errors occurred while applying the new configuration (--config.file=\"/prometheus/config_out/config.yaml\")","level":"error","msg":"Error reloading config","ts":"2024-03-26T21:24:20.266Z

Consequences for failing config reloading are not as bad as I initially thought, it's only per reloader per job functionality got stopped in some state, but perhaps there is a way to have consistent status page error instead of failing applying.

I have rdy GKE cluster with your changes applied (will have it running for some time) if you want to check e.g. @TheSpiritXIII

AC

Ideally permission error does not fail configuration apply but behave similar to not found secret or not found port etc.
Ideally permission error results in more descriptive error log/status than "unknown"
Double check target status feature for not found / no permission errors related to secrets

Nice to have:

Ideally operator logs (or provides in status or via webhook) the exact RBAC role + binding to apply when missing. This is hard to do a bit on webhook, easy to log on collector though. The latter however is bit deep to find by customers. Putting two small-ish YAMLs through target status might be odd two (maybe fine?). For this case we might want to put it in "analysis/troubleshooting" CLI/functionality we discussed one day..

The text was updated successfully, but these errors were encountered:

github-actions bot assigned TheSpiritXIII Mar 27, 2024

bwplotka mentioned this issue Mar 27, 2024

feat: support secrets in scrape authorization #776

Merged

TheSpiritXIII assigned pintohutch and unassigned TheSpiritXIII Jul 11, 2024

pintohutch removed their assignment Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

secrets: Improve debuggability & reliability of misconfigured *Monitoring CRs with secrets. #917

secrets: Improve debuggability & reliability of misconfigured *Monitoring CRs with secrets. #917

bwplotka commented Mar 27, 2024

secrets: Improve debuggability & reliability of misconfigured *Monitoring CRs with secrets. #917

secrets: Improve debuggability & reliability of misconfigured *Monitoring CRs with secrets. #917

Comments

bwplotka commented Mar 27, 2024

AC