-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update steps for setting up metrics on openshift, focusing on single … #953
Conversation
…cluster Signed-off-by: David Martin <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #953 +/- ##
==========================================
- Coverage 81.49% 78.75% -2.75%
==========================================
Files 102 113 +11
Lines 7177 9558 +2381
==========================================
+ Hits 5849 7527 +1678
- Misses 898 1620 +722
+ Partials 430 411 -19
Flags with carried forward coverage won't be shown. Click here to find out more.
|
doc/install/install-openshift.md
Outdated
|
||
```bash | ||
kubectl apply -f https://raw.githubusercontent.com/Kuadrant/kuadrant-operator/main/config/observability/openshift/kube-state-metrics.yaml | ||
kubectl apply -k https://github.com/Kuadrant/gateway-api-state-metrics?ref=main | ||
``` | ||
|
||
To enable request metrics in Istio, you must create a `telemetry` resource as follows: | ||
To enable request metrics in Istio and scrape them, create the following resources: | ||
|
||
```bash | ||
kubectl apply -f https://raw.githubusercontent.com/Kuadrant/kuadrant-operator/main/config/observability/openshift/telemetry.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not strictly necessary to have request metrics in Istio, those are enabled by default. This Telemetry configuration adds the request path as a label to the request metrics, which is not always desirable as it is a high cardinality label that can flood your prometheus instance if you have a big API. For example each resource in an API would generate a different prometheus time-series. We probably should warn about this at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
I'll split this out and explain better with a warning.
```bash | ||
kubectl apply -f - <<EOF | ||
apiVersion: monitoring.coreos.com/v1 | ||
kind: ServiceMonitor | ||
metadata: | ||
name: ingress-gateway | ||
namespace: ${gatewayNS} | ||
spec: | ||
selector: | ||
matchLabels: | ||
istio.io/gateway-name: ${gatewayName} | ||
endpoints: | ||
- port: metrics | ||
path: /stats/prometheus | ||
--- | ||
apiVersion: v1 | ||
kind: Service | ||
metadata: | ||
name: ingress-metrics-proxy | ||
namespace: ${gatewayNS} | ||
labels: | ||
istio.io/gateway-name: ${gatewayName} | ||
spec: | ||
selector: | ||
istio.io/gateway-name: ${gatewayName} | ||
ports: | ||
- name: metrics | ||
protocol: TCP | ||
port: 15020 | ||
targetPort: 15020 | ||
EOF | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be done using a single PodMonitor that targets all your Gateways because Istio annotates the gateway pods with the port where the metrics are served. It's slightly convoluted though, but has the advantage of targeting all gateways in the namespace with a single PodMonitor:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: istio-proxies-monitor
spec:
selector:
matchExpressions:
- key: istio-prometheus-ignore
operator: DoesNotExist
podMetricsEndpoints:
- path: /stats/prometheus
interval: 30s
relabelings:
- action: keep
sourceLabels: ["__meta_kubernetes_pod_container_name"]
regex: "istio-proxy"
- action: keep
sourceLabels:
["__meta_kubernetes_pod_annotationpresent_prometheus_io_scrape"]
- action: replace
regex: (\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4})
replacement: "[$2]:$1"
sourceLabels:
[
"__meta_kubernetes_pod_annotation_prometheus_io_port",
"__meta_kubernetes_pod_ip",
]
targetLabel: "__address__"
- action: replace
regex: (\d+);((([0-9]+?)(\.|$)){4})
replacement: "$2:$1"
sourceLabels:
[
"__meta_kubernetes_pod_annotation_prometheus_io_port",
"__meta_kubernetes_pod_ip",
]
targetLabel: "__address__"
- action: labeldrop
regex: "__meta_kubernetes_pod_label_(.+)"
- sourceLabels: ["__meta_kubernetes_namespace"]
action: replace
targetLabel: namespace
- sourceLabels: ["__meta_kubernetes_pod_name"]
action: replace
targetLabel: pod_name
I recall getting this from the Istio documentation, but I can't find it now ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, here is the source https://github.com/istio-ecosystem/sail-operator/blob/main/docs/README.md#observability-integrations
Sail documentation, not Istio's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a similar looking 'additionalScrapeConfig' here but it doesn't work with user workload monitoring on Openshift due to restrictions on what can be configured.
If this single PodMonitor approach works with UWM, I think that would be more robust than the Service/ServiceMonitor approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have tested this in OCP with UWM and works as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should note that I tested it using sail-operator to install Istio, but it should work the same for other Istio install methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of comments, looks good overall.
doc/install/install-openshift.md
Outdated
|
||
There is 1 more metrics configuration that needs to be applied so that all relevant metrics are being scraped. | ||
That configuration depends on where you deploy your Gateway. | ||
The steps to configure that are detailed in the follow on 'Secure, protect, and connect' guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should provide link to Secure, protect, and connect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it, and held off as the link is at the end of the guide as a follow on.
However, no harm in linking here too for easier navigation options.
doc/install/install-openshift.md
Outdated
|
||
For Grafana installation details, see [installing Grafana on OpenShift](https://cloud.redhat.com/experts/o11y/ocp-grafana/). When installed, you must [set up a data source to the thanos-querier route in the OpenShift cluster](https://docs.openshift.com/container-platform/4.15/observability/monitoring/accessing-third-party-monitoring-apis.html#accessing-metrics-from-outside-cluster_accessing-monitoring-apis-by-using-the-cli). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanos datasource setup is also described in `install Grafana on OpenShifte guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
I think i'll call that out here, but keep the 2nd link as well as it's a more 'more details' and official way of accessing thanos-querier.
Signed-off-by: David Martin <[email protected]>
…cluster