Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage (>50Gi) when scraping Prometheus metrics #1358

Open
nar-git opened this issue Sep 30, 2024 · 0 comments
Open

High memory usage (>50Gi) when scraping Prometheus metrics #1358

nar-git opened this issue Sep 30, 2024 · 0 comments

Comments

@nar-git
Copy link

nar-git commented Sep 30, 2024

Describe the bug
High memory usage (>50Gi) when scraping Prometheus metrics in EKS on EC2 cluster using cloud watch agent. Our cluster have below resources and the agent memory limit set to 50Gi and getting OOMKilled in every 5 minutes.

Resources Count
pods 429
namespaces (99% empty) 57776
endpoints 255
services 254

Steps to reproduce
Deploy cloud watch agent as a K8 deployment resource with below configurations in out cluster

  prometheus.yaml: |
    global:
      evaluation_interval: 1m
      scrape_interval: 30s
      scrape_timeout: 10s
    scrape_configs:
    - honor_labels: true
      job_name: kubernetes-service-endpoints
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      metricRelabelings:
        - action: drop
        source_labels:
        - instance
"logs": {
    "metrics_collected": {
      "prometheus": {
        "cluster_name": "<name>",
        "prometheus_config_path": "/etc/prometheusconfig/prometheus.yaml",
        "log_group_name": "/aws/containerinsights/"<name>",/cwagent-prometheus/performance",
        "emf_processor": {
          "metric_declaration": [
            {
              "source_labels": [
                "namespace"
              ],
              "label_matcher": "<removed>",
              "dimensions": [
                [
                  "namespace",
                  "ClusterName",
                  "pod",
                  "container"
                ],
                [
                  "namespace",
                  "ClusterName",
                  "pod"
                ],
                [
                  "namespace",
                  "ClusterName"
                ],
                [
                  "namespace",
                  "ClusterName",
                  "pod",
                  "container",
                  "reason"
                ],
                [
                  "namespace",
                  "ClusterName",
                  "pod",
                  "reason"
                ],
                [
                  "namespace",
                  "ClusterName",
                  "reason"
                ]
              ],
              "metric_selectors": [
                ".*"
              ]
            }
          ]
        }
      }
    }

What did you expect to see?
The expectation is that agent may use low (<10Gi) memory.

What did you see instead?
A very high memory usage(~60Gi)

What version did you use?
cloudwatch-agent:1.300046.0b833

Environment
OS: Amazon Linux 2 - 5.10.224-212.876.amzn2.x86_64

@nar-git nar-git changed the title High memory usage (>60Gi) when scraping Prometheus metrics High memory usage (>50Gi) when scraping Prometheus metrics Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant