You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
In order to monitor a Kubernetes cluster, we have multiple instances of the OpenTelemetry collector, each one having a specific role
a daemonset to collect local telemetry on each node (kubeletstats metrics, hostmetrics, logs).
a deployment with 1 replica to collect cluster wide telemetry (k8s_cluster receiver).
Finally, a statefulset to act as the gateway and relay all the telemetry signals to our backends (we use statefulset to use a PVC to persist the export queue)
In the gateway, we add common resource attributes (like cluster name, tenants, ...) and also use the k8sattributes processor to add other resource attributes related to the pods. For this, we followed the recommendations from k8sattributesprocessor#as-a-gateway and set the passthrough mode accordingly.
There's a catch though with this architecture: the first 2 set of collectors will collect metrics which are not related to pods but other objects such as nodes, daemonset, statefulset, (for example k8s.daemonset.desired_scheduled_nodes, `)...
And when those metrics are received by the gateway collector, then the k8sattributes processor tries to find the corresponding pod using pod_association rules based on pod IP and pod UID attributes. As those attributes are not in the metric, it falls back to the connection details. In that case, the connection details point to the OpenTelemetry collector that scraped that metric. Thus, it will incorrectly set as resource attributes to those metrics details from the OpenTelemetry pods. Note that we can have other pods sending their own telemetry to the gateway collector so we like to keep the connection pod association rule.
To me, this is not really a bug but more a side effect of our deployment architecture. But I wanted to know if someone else already faced that issue and what was put in place to alleviate this. We have different options (for example: a separate pipeline in the gateway for the other OpenTelemetry collectors, not using the connection as pod association rule). Any thoughts on this?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi,
In order to monitor a Kubernetes cluster, we have multiple instances of the OpenTelemetry collector, each one having a specific role
In the gateway, we add common resource attributes (like cluster name, tenants, ...) and also use the k8sattributes processor to add other resource attributes related to the pods. For this, we followed the recommendations from k8sattributesprocessor#as-a-gateway and set the
passthrough
mode accordingly.There's a catch though with this architecture: the first 2 set of collectors will collect metrics which are not related to pods but other objects such as nodes, daemonset, statefulset, (for example
k8s.daemonset.desired_scheduled_nodes
, `)...And when those metrics are received by the gateway collector, then the k8sattributes processor tries to find the corresponding pod using pod_association rules based on pod IP and pod UID attributes. As those attributes are not in the metric, it falls back to the connection details. In that case, the connection details point to the OpenTelemetry collector that scraped that metric. Thus, it will incorrectly set as resource attributes to those metrics details from the OpenTelemetry pods. Note that we can have other pods sending their own telemetry to the gateway collector so we like to keep the connection pod association rule.
To me, this is not really a bug but more a side effect of our deployment architecture. But I wanted to know if someone else already faced that issue and what was put in place to alleviate this. We have different options (for example: a separate pipeline in the gateway for the other OpenTelemetry collectors, not using the connection as pod association rule). Any thoughts on this?
Beta Was this translation helpful? Give feedback.
All reactions