Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Headlamp cluster metrics are not showing the proper values #2043

Open
mariogkds opened this issue Jun 16, 2024 · 7 comments · May be fixed by #2338
Open

Headlamp cluster metrics are not showing the proper values #2043

mariogkds opened this issue Jun 16, 2024 · 7 comments · May be fixed by #2338
Assignees
Labels
bug Something isn't working charts prometheus Relating to prometheus and the prometheus plugin
Milestone

Comments

@mariogkds
Copy link

Hello, i am a new user, i really liked the project.

I am having some problems with the cluster wide metrics that are show on the dashboard:

image

I am using kube-prometheus-stack to handle prometheus and grafana and i am using prometheus-adapter for the metrics API.

To get the headlamp to even show anything i had to add a few settings to the chart's values:

kube-prometheus-stack

    kubelet:
      serviceMonitor:
        metricRelabelings:
          - action: replace
            sourceLabels:
              - node
            targetLabel: instance
    prometheus-node-exporter:
      prometheus:
        monitor:
          attachMetadata:
            node: true
          relabelings:
            - sourceLabels:
                - __meta_kubernetes_endpoint_node_name
              targetLabel: node
              action: replace
              regex: (.+)
              replacement: ${1}
          metricRelabelings:
            - action: replace
              regex: (.*)
              replacement: $1
              sourceLabels:
                - __meta_kubernetes_pod_node_name
              targetLabel: kubernetes_node

prometheus-adapter (which is normal to get the metrics apis)

      resource:
        cpu:
          containerQuery: |
            sum by (<<.GroupBy>>) (
              rate(container_cpu_usage_seconds_total{container!="",<<.LabelMatchers>>}[3m])
            )
          nodeQuery: |
            sum  by (<<.GroupBy>>) (
              rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal",<<.LabelMatchers>>}[3m])
            )
          resources:
            overrides:
              node:
                resource: node
              namespace:
                resource: namespace
              pod:
                resource: pod
          containerLabel: container
        memory:
          containerQuery: |
            sum by (<<.GroupBy>>) (
              avg_over_time(container_memory_working_set_bytes{container!="",<<.LabelMatchers>>}[3m])
            )
          nodeQuery: |
            sum by (<<.GroupBy>>) (
              avg_over_time(node_memory_MemTotal_bytes{<<.LabelMatchers>>}[3m])
              -
              avg_over_time(node_memory_MemAvailable_bytes{<<.LabelMatchers>>}[3m])
            )
          resources:
            overrides:
              node:
                resource: node
              namespace:
                resource: namespace
              pod:
                resource: pod
          containerLabel: container
        window: 3m

Individual node's CPU values are correct, the memory value is correct as well but the unit is different:
image

image

Is this a headlamp problem or this a prometheus(me) problem?

Thanks for the help and the project have a nice day.

@joaquimrocha
Copy link
Collaborator

Hi @mariogkds . Thanks for the report. This looks like a unit conversion issue.
We will take a look.

@joaquimrocha joaquimrocha added the bug Something isn't working label Jun 18, 2024
@illume illume added prometheus Relating to prometheus and the prometheus plugin charts labels Jul 8, 2024
@sarg3nt
Copy link

sarg3nt commented Sep 5, 2024

@joaquimrocha I'm seeing this in metrics for RAM in deployments and pods too. Probably other places as well?
Grafana and crictl report values correctly but headlamp is showing much more.
Example, the headlamp pod, in the Headlamp UI is showing 40 MB RAM being used but it's actually 20.76 MB according to Grafana and crictl So looks like about double.
CPU and network are correct.
Is this going to get fixed soon, it's confusing our users.
Headlamp 0.25.1

@joaquimrocha
Copy link
Collaborator

@sarg3nt Yes, we do want to fix this but haven't had the bandwidth yet. Let me try to get it in our pipeline for the next release.

@joaquimrocha joaquimrocha added this to the v0.26.0 milestone Sep 16, 2024
@skoeva skoeva linked a pull request Sep 17, 2024 that will close this issue
@skoeva
Copy link
Contributor

skoeva commented Oct 8, 2024

Hi @mariogkds @sarg3nt , thanks for raising these issues! Would you be able to provide the YAML (with any sensitive data redacted) for the problematic resources? Would be super helpful for testing ^^

@joaquimrocha
Copy link
Collaborator

Hi @mariogkds and @sarg3nt , we really want to address this issue but we haven't been able to reproduce. If you don't mind, please send us some sample YAML based on yours so @skoeva can take a look.

@sarg3nt
Copy link

sarg3nt commented Oct 27, 2024

@joaquimrocha sorry for the late reply. Work has been super busy. I'll get you something on Monday.

@joaquimrocha joaquimrocha modified the milestones: v0.26.0, v0.27.0 Nov 5, 2024
@skoeva
Copy link
Contributor

skoeva commented Nov 7, 2024

We've just released our latest version :D

Just a reminder: if you guys are still running into this issue and would like us to get a fix in, your sample YAML would be super helpful to see

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working charts prometheus Relating to prometheus and the prometheus plugin
Projects
Status: Blocked
Development

Successfully merging a pull request may close this issue.

5 participants