Provide example kubernetes manifest #661

jcpunk · 2024-01-22T16:14:17Z

This provides an example for how you might deploy this in kubernetes.

It includes node selectors defined by the Node Feature Discovery SIG and podMonitors defined by the Prometheus Operator initiative.

rdementi · 2024-01-23T09:48:31Z

thanks a lot for the patch. Please let me find a reviewer

ppalucki · 2024-01-26T10:03:46Z

I have two questions:

Why do you use privileged:true? (Is is only because suggest in how to for docker?) I tested it without (but I have to add PCM_NO_MSR to enviornment)
like this

        - name: PCM_NO_MSR
          value: "1"

and it worked in my enviornment (it used Linux perf interface). Was there any other reason to use privilaged? Can you check, does it work for you without privilaged?

Without privileged, we could put less strcit requirments for namespace (with labels) I just want to follow least privileged principle if it doesn't break any functionality.

Why hostNetwork: true - is it only to simplify configuration of Prometheus discovery with podMonitor - or is there any other reason?

FYI: I'm going for vacation for a week, I'll comeback to review in the second week of February, so no rush.

jcpunk · 2024-01-26T17:41:47Z

I did use the privileged flag because of the docker documentation. I'd be happy to drop it, but I don't really understand the risks. I do seem to get data back with it set to disabled. So if you think it would be safe, I'd be happy to drop it.

I set hostNetwork: true for folks who want to scrape this from an external prometheus. I was trying to think of a way to make it easy to either use prometheus-operator or to do your own thing. I'd be fine to drop it.

ppalucki

I would be ready to accept this as is when we drop privileged and hostNetwork and just need to be sure it works without functional issues in bare "kind based" testing enviorment .

pcm-kubernetes.yaml

Signed-off-by: Pat Riehecky <[email protected]>

jcpunk · 2024-02-12T17:27:14Z

In theory I've made the changes you requested. Does this look better?

ppalucki

It looks definitelly better and it works flaweslly! :) so LGTM

Here is the functional test to be further used for validation:

# Create cluster
kind create cluster
kind export kubeconfig

# Deploy NodeFeatureDiscovery
kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.15.1
kubectl get node -o jsonpath='{.items[0].metadata.labels.feature\.node\.kubernetes\.io\/cpu\-model\.vendor_id}{"\n"}'

# Deploy prometheus for PodMonitor
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack --set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false
kubectl get sts prometheus-prometheus-kube-prometheus-prometheus

# Deploy PCM
kubectl apply -f pcm-kubernetes.yaml

# Verfiy PCM works as expected
kubectl -n intel-pcm get daemonset
kubectl -n intel-pcm get pods
podname=`kubectl -n intel-pcm get pods -ojsonpath='{.items[0].metadata.name}'`
kubectl proxy &
curl -Ls http://127.0.0.1:8001/api/v1/namespaces/intel-pcm/pods/$podname/proxy/metrics | grep DRAM_Writes
promtool query instant http://127.0.0.1:8001/api/v1/namespaces/default/services/prometheus-kube-prometheus-prometheus:http-web/proxy 'avg by(__name__) ({job="pcm"})'

and we get

CStateResidency => 0.09090909090909094 @[1707901856.957]
Clock_Unhalted_Ref => 1010026077.3913049 @[1707901856.957]
Clock_Unhalted_Thread => 1295730425.8695648 @[1707901856.957]
DRAM_Joules_Consumed => 0 @[1707901856.957]
DRAM_Reads => 3600814506.6666665 @[1707901856.957]
DRAM_Writes => 1974366592 @[1707901856.957]
Embedded_DRAM_Reads => 0 @[1707901856.957]
Embedded_DRAM_Writes => 0 @[1707901856.957]
Incoming_Data_Traffic_On_Link_0 => 689786624 @[1707901856.957]
Incoming_Data_Traffic_On_Link_1 => 689454432 @[1707901856.957]
Incoming_Data_Traffic_On_Link_2 => 0 @[1707901856.957]
Instructions_Retired_Any => 749013885.5739133 @[1707901856.957]
Invariant_TSC => 432975372048881700 @[1707901856.957]
L2_Cache_Hits => 3531524.973913045 @[1707901856.957]
L2_Cache_Misses => 2334387.130434784 @[1707901856.957]
L3_Cache_Hits => 1325323.1739130428 @[1707901856.957]
L3_Cache_Misses => 627863.4000000003 @[1707901856.957]
L3_Cache_Occupancy => 0 @[1707901856.957]
Local_Memory_Bandwidth => 0 @[1707901856.957]
Measurement_Interval_in_us => 14507400443881 @[1707901856.957]
Memory_Controller_IO_Requests => 0 @[1707901856.957]
Number_of_sockets => 2 @[1707901856.957]
OS_ID => 55.499999999999986 @[1707901856.957]
Outgoing_Data_And_Non_Data_Traffic_On_Link_0 => 1843333122.5 @[1707901856.957]
Outgoing_Data_And_Non_Data_Traffic_On_Link_1 => 1849219231.5 @[1707901856.957]
Outgoing_Data_And_Non_Data_Traffic_On_Link_2 => 0 @[1707901856.957]
Package_Joules_Consumed => 0 @[1707901856.957]
Persistent_Memory_Reads => 0 @[1707901856.957]
Persistent_Memory_Writes => 0 @[1707901856.957]
RawCStateResidency => 89486131.66409859 @[1707901856.957]
Remote_Memory_Bandwidth => 0 @[1707901856.957]
SMI_Count => 0 @[1707901856.957]
Thermal_Headroom => -2147483648 @[1707901856.957]
Utilization_Incoming_Data_Traffic_On_Link_0 => 0 @[1707901856.957]
Utilization_Incoming_Data_Traffic_On_Link_1 => 0 @[1707901856.957]
Utilization_Incoming_Data_Traffic_On_Link_2 => 0 @[1707901856.957]
Utilization_Outgoing_Data_And_Non_Data_Traffic_On_Link_0 => 0 @[1707901856.957]
Utilization_Outgoing_Data_And_Non_Data_Traffic_On_Link_1 => 0 @[1707901856.957]
Utilization_Outgoing_Data_And_Non_Data_Traffic_On_Link_2 => 0 @[1707901856.957]

ps. above test was run on Intel(R) Xeon(R) Platinum 8180 CPU - for VM based hosts we will have issues depending on the type (e.g. we may need to comment out MCFG/sys-acpi volume as described in FAQ Q11 )

rdementi

thanks a lot!

opcm assigned ppalucki Jan 27, 2024

ppalucki reviewed Feb 8, 2024

View reviewed changes

pcm-kubernetes.yaml Outdated Show resolved Hide resolved

pcm-kubernetes.yaml Outdated Show resolved Hide resolved

pcm-kubernetes.yaml Show resolved Hide resolved

Provide example kubernetes manifest

3a55c91

Signed-off-by: Pat Riehecky <[email protected]>

jcpunk requested a review from ppalucki February 12, 2024 17:27

ppalucki approved these changes Feb 14, 2024

View reviewed changes

rdementi approved these changes Feb 14, 2024

View reviewed changes

rdementi merged commit 1932047 into intel:master Feb 14, 2024
30 checks passed

jcpunk deleted the k8s-deployment branch February 14, 2024 19:31

ppalucki mentioned this pull request Apr 24, 2024

[Feature] PCM helm chart intel/helm-charts#33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide example kubernetes manifest #661

Provide example kubernetes manifest #661

jcpunk commented Jan 22, 2024

rdementi commented Jan 23, 2024

ppalucki commented Jan 26, 2024 •

edited

Loading

jcpunk commented Jan 26, 2024

ppalucki left a comment

jcpunk commented Feb 12, 2024

ppalucki left a comment

rdementi left a comment

Provide example kubernetes manifest #661

Provide example kubernetes manifest #661

Conversation

jcpunk commented Jan 22, 2024

rdementi commented Jan 23, 2024

ppalucki commented Jan 26, 2024 • edited Loading

jcpunk commented Jan 26, 2024

ppalucki left a comment

Choose a reason for hiding this comment

jcpunk commented Feb 12, 2024

ppalucki left a comment

Choose a reason for hiding this comment

rdementi left a comment

Choose a reason for hiding this comment

ppalucki commented Jan 26, 2024 •

edited

Loading