Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TMA 4.8 perf converted JSON files #182

Merged
merged 1 commit into from
May 28, 2024

Conversation

weilinwa
Copy link
Contributor

Changes included in this PR:

  1. New change based on up-to-date event files and metrics files release in PR: TMA 4.8 Release #181
  2. Two new fields in event JSON files, Counter and Experimental, from PR: Update field names in counter.json file #165 and PR: Add experimental field to perf json #170
  3. Add new counter.json files, from PR: Update field names in counter.json file #165

@weilinwa
Copy link
Contributor Author

@edwarddavidbaker, could you please also request Ian as a reviewer for this PR? Thanks!

@edwarddavidbaker
Copy link
Contributor

@edwarddavidbaker, could you please also request Ian as a reviewer for this PR? Thanks!

Sure, added Ian to the reviewers.

"MetricName": "llc_data_read_demand_plus_prefetch_miss_latency",
"ScaleUnit": "1ns"
},
{
"BriefDescription": "Average latency of a last level cache (LLC) demand and prefetch data read miss (read memory access) addressed to local memory in nano seconds",
"MetricExpr": "1e9 * (cha@UNC_CHA_TOR_OCCUPANCY.IA_MISS\\,config1\\=0x40432@ / cha@UNC_CHA_TOR_INSERTS.IA_MISS\\,config1\\=0x40432@) / (UNC_CHA_CLOCKTICKS / (#num_cores / #num_packages * #num_packages)) * duration_time",
"MetricExpr": "llc_data_read_demand_plus_prefetch_miss_latency_for_remote_requests",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks funny, local miss request latency is computed using remote request miss latency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@captain5050, yes, the event data used in these metrics' expressions was not complete, which causes these metrics have the same metric expression and then got rewrote by in term of the other metric name. We just updated these metrics. Thanks!

@@ -241,13 +241,13 @@
},
{
"BriefDescription": "Memory read that miss the last level cache (LLC) addressed to local DRAM as a percentage of total memory read accesses, does not include LLC prefetches.",
"MetricExpr": "cha@UNC_CHA_TOR_INSERTS.IA_MISS\\,config1\\=0x40432@ / (cha@UNC_CHA_TOR_INSERTS.IA_MISS\\,config1\\=0x40432@ + cha@UNC_CHA_TOR_INSERTS.IA_MISS\\,config1\\=0x40431@)",
"MetricExpr": "numa_reads_addressed_to_remote_dram",
"MetricName": "numa_reads_addressed_to_local_dram",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same local aliased to remote issue appears here.

@@ -156,31 +156,31 @@
},
{
"BriefDescription": "Ratio of number of code read requests missing last level core cache (includes demand w/ prefetches) to the total number of completed instructions",
"MetricExpr": "(cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x181@ + cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x191@) / INST_RETIRED.ANY",
"MetricExpr": "llc_data_read_mpi_demand_plus_prefetch",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metric is written in terms of itself, ie it is recursive and would lock up if perf tried to evaluate it. There is a test in perf that detects this:

https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/metricgroup.c#n918

It may be worth copying the json here into a perf tree and running the tests to detect issues like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@captain5050, this is metric llc_code_read_mpi_demand_plus_prefetch written in terms of llc_data_read_mpi_demand_plus_prefetch. They are two different metrics with very similar names.

But I think this is also having the same issue like the SKX metrics. We will work on this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, could we squash some (all maybe) of the changes? It is hard to review patches when there are fixes on the patches later in the series.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edwarddavidbaker, I guess I could make this PR to compare against the TMA-4.8-Release branch instead of the main branch to get it easier to read? What's your suggestion? Thanks!

Copy link
Contributor

@edwarddavidbaker edwarddavidbaker May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Do we need two pull requests?
  • Ian can comment with his preferences. I typically use interactive rebases (and force push the branch) to incorporate feedback / tweaks. This in essence creates a new patch set and is the typical work flow on gerrit. As a concrete example here, all of Caleb's commits would be squished into a single 4.8 change. Your changes then update the output directory scripts/perf, and can be squished into a single commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With gerrit you can keep the magic Change-Id tag and gerrit time orders changes uploaded with it. I'm not sure how this works on github. My point of pain is trying to review many too large to open changes on github with fixes then layered on top in different patches. This makes it hard to say whether an issue is or isn't still present without mentally squashing the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One or two PRs, both are fine. But I think I don't have write access to Caleb's branch. I'm not sure how to push changes to that PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@captain5050, now we only have changes to the script/perf/ directory remain in this PR. Does this work better?

"MetricName": "llc_data_read_demand_plus_prefetch_miss_latency",
"ScaleUnit": "1ns"
},
{
"BriefDescription": "Average latency of a last level cache (LLC) demand and prefetch data read miss (read memory access) addressed to local memory in nano seconds",
"MetricExpr": "1e9 * (cha@UNC_CHA_TOR_OCCUPANCY.IA_MISS\\,config1\\=0x40432@ / cha@UNC_CHA_TOR_INSERTS.IA_MISS\\,config1\\=0x40432@) / (UNC_CHA_CLOCKTICKS / (source_count(UNC_CHA_CLOCKTICKS) * #num_packages)) * duration_time",
"MetricExpr": "llc_data_read_demand_plus_prefetch_miss_latency_for_remote_requests",
"MetricName": "llc_data_read_demand_plus_prefetch_miss_latency_for_local_requests",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the same issue as on skylakex which is fixed in a later commit - local requests metric is aliased to remote requests. Could we make the fix on all architectures and squash the fix commit into this one so that we don't need to spot bugs then check for later fixes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@captain5050, this error turns out to be an issue caused by the mixed commits in this PR. @edwarddavidbaker just helped me updated this PR to one squashed commit. Hopefully, this makes it cleaner and easier to review.

@weilinwa weilinwa force-pushed the perf_converted_json_tma4.8 branch 2 times, most recently from 5803bf7 to 3990a25 Compare May 21, 2024 21:48
@weilinwa
Copy link
Contributor Author

@captain5050, I put the files into perf code and ran the perf tests. It passed most of the tests including PMU JSON event tests and Sysfs PMU tests. But the all metricgroug tests and all metrics test failed at power metrics and metrics like upi_data_transmit_bw because of missing PMU or events. Not sure if there are other metric would fail because the both of them end early at the failure. For reference, I did see similar test failures with the original perf pmu-event json files, so I'm thinking that might be caused by my system settings.

@captain5050
Copy link
Contributor

@captain5050, I put the files into perf code and ran the perf tests. It passed most of the tests including PMU JSON event tests and Sysfs PMU tests. But the all metricgroug tests and all metrics test failed at power metrics and metrics like upi_data_transmit_bw because of missing PMU or events. Not sure if there are other metric would fail because the both of them end early at the failure. For reference, I did see similar test failures with the original perf pmu-event json files, so I'm thinking that might be caused by my system settings.

We should be good. IIRC those metrics aren't using json events and so this points to a sysfs issue on your test machine.

@@ -117,7 +117,7 @@
"MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.ALLOC_RESTRICTIONS@ / tma_info_core_slots",
"MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group",
"MetricName": "tma_alloc_restriction",
"MetricThreshold": "tma_alloc_restriction > 0.1",
"MetricThreshold": "tma_alloc_restriction > 0.1 & (tma_resource_bound > 0.2 & tma_backend_bound_aux > 0.2)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note that there are a lot of improved thresholds but we may get multiplexing issues because of the greater number of metrics/events.

@@ -684,28 +712,28 @@
"PublicDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative branch misprediction (retired JEClear). Related metrics: tma_branch_mispredicts, tma_info_bottleneck_mispredictions, tma_mispredicts_resteers"
},
{
"BriefDescription": "Instructions per retired mispredicts for conditional non-taken branches (lower number means higher occurrence rate).",
"BriefDescription": "Instructions per retired Mispredicts for conditional non-taken branches (lower number means higher occurrence rate).",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, the case change seems unnecessary/worse leading to a large number of changes.

Copy link
Contributor

@captain5050 captain5050 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some seemingly unnecessary case changes, and changes for metric groups where names like BvML aren't explained. That's caused from the source spreadsheet. The conversion looks good.

@weilinwa weilinwa force-pushed the perf_converted_json_tma4.8 branch from 3990a25 to 4d86844 Compare May 23, 2024 22:39
There is a set of new metricgroup added in this release. They map to the related
bottleneck metrics like below:

BvMP: tma_info_bottleneck_mispredictions
BvBC: tma_info_bottleneck_big_code
BvFB: tma_info_bottleneck_instruction_fetch_bandwidth
BvMB: tma_info_bottleneck_cache_memory_bandwidth
BvML: tma_info_bottleneck_cache_memory_latency
BvMT: tma_info_bottleneck_memory_data_tlbs
BvMS: tma_info_bottleneck_memory_synchronization
BvCB: tma_info_bottleneck_compute_bound_est
BvIO: tma_info_bottleneck_irregular_overhead
BvOB: tma_info_bottleneck_other_bottlenecks
BvBO: tma_info_bottleneck_branching_overhead
BvUW: tma_info_bottleneck_useful_work
@weilinwa weilinwa force-pushed the perf_converted_json_tma4.8 branch from 4d86844 to 59194d4 Compare May 23, 2024 23:30
@weilinwa
Copy link
Contributor Author

There are some seemingly unnecessary case changes, and changes for metric groups where names like BvML aren't explained. That's caused from the source spreadsheet. The conversion looks good.

@captain5050, we've updated commit message to add explanations on the new metric groups and revert the case changes. Please take a look at the updates. Thanks!

Copy link
Contributor

@edwarddavidbaker edwarddavidbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ian and Weilin!

@edwarddavidbaker edwarddavidbaker merged commit f74babd into intel:main May 28, 2024
3 checks passed
@weilinwa weilinwa deleted the perf_converted_json_tma4.8 branch May 29, 2024 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants