-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TMA 4.8 perf converted JSON files #182
TMA 4.8 perf converted JSON files #182
Conversation
bb033e8
to
b3750ad
Compare
@edwarddavidbaker, could you please also request Ian as a reviewer for this PR? Thanks! |
Sure, added Ian to the reviewers. |
"MetricName": "llc_data_read_demand_plus_prefetch_miss_latency", | ||
"ScaleUnit": "1ns" | ||
}, | ||
{ | ||
"BriefDescription": "Average latency of a last level cache (LLC) demand and prefetch data read miss (read memory access) addressed to local memory in nano seconds", | ||
"MetricExpr": "1e9 * (cha@UNC_CHA_TOR_OCCUPANCY.IA_MISS\\,config1\\=0x40432@ / cha@UNC_CHA_TOR_INSERTS.IA_MISS\\,config1\\=0x40432@) / (UNC_CHA_CLOCKTICKS / (#num_cores / #num_packages * #num_packages)) * duration_time", | ||
"MetricExpr": "llc_data_read_demand_plus_prefetch_miss_latency_for_remote_requests", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks funny, local miss request latency is computed using remote request miss latency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@captain5050, yes, the event data used in these metrics' expressions was not complete, which causes these metrics have the same metric expression and then got rewrote by in term of the other metric name. We just updated these metrics. Thanks!
@@ -241,13 +241,13 @@ | |||
}, | |||
{ | |||
"BriefDescription": "Memory read that miss the last level cache (LLC) addressed to local DRAM as a percentage of total memory read accesses, does not include LLC prefetches.", | |||
"MetricExpr": "cha@UNC_CHA_TOR_INSERTS.IA_MISS\\,config1\\=0x40432@ / (cha@UNC_CHA_TOR_INSERTS.IA_MISS\\,config1\\=0x40432@ + cha@UNC_CHA_TOR_INSERTS.IA_MISS\\,config1\\=0x40431@)", | |||
"MetricExpr": "numa_reads_addressed_to_remote_dram", | |||
"MetricName": "numa_reads_addressed_to_local_dram", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same local aliased to remote issue appears here.
@@ -156,31 +156,31 @@ | |||
}, | |||
{ | |||
"BriefDescription": "Ratio of number of code read requests missing last level core cache (includes demand w/ prefetches) to the total number of completed instructions", | |||
"MetricExpr": "(cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x181@ + cbox@UNC_C_TOR_INSERTS.MISS_OPCODE\\,filter_opc\\=0x191@) / INST_RETIRED.ANY", | |||
"MetricExpr": "llc_data_read_mpi_demand_plus_prefetch", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This metric is written in terms of itself, ie it is recursive and would lock up if perf tried to evaluate it. There is a test in perf that detects this:
It may be worth copying the json here into a perf tree and running the tests to detect issues like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@captain5050, this is metric llc_code_read_mpi_demand_plus_prefetch
written in terms of llc_data_read_mpi_demand_plus_prefetch
. They are two different metrics with very similar names.
But I think this is also having the same issue like the SKX metrics. We will work on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, could we squash some (all maybe) of the changes? It is hard to review patches when there are fixes on the patches later in the series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edwarddavidbaker, I guess I could make this PR to compare against the TMA-4.8-Release
branch instead of the main
branch to get it easier to read? What's your suggestion? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Do we need two pull requests?
- Ian can comment with his preferences. I typically use interactive rebases (and force push the branch) to incorporate feedback / tweaks. This in essence creates a new patch set and is the typical work flow on gerrit. As a concrete example here, all of Caleb's commits would be squished into a single 4.8 change. Your changes then update the output directory
scripts/perf
, and can be squished into a single commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With gerrit you can keep the magic Change-Id tag and gerrit time orders changes uploaded with it. I'm not sure how this works on github. My point of pain is trying to review many too large to open changes on github with fixes then layered on top in different patches. This makes it hard to say whether an issue is or isn't still present without mentally squashing the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One or two PRs, both are fine. But I think I don't have write access to Caleb's branch. I'm not sure how to push changes to that PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@captain5050, now we only have changes to the script/perf/ directory remain in this PR. Does this work better?
"MetricName": "llc_data_read_demand_plus_prefetch_miss_latency", | ||
"ScaleUnit": "1ns" | ||
}, | ||
{ | ||
"BriefDescription": "Average latency of a last level cache (LLC) demand and prefetch data read miss (read memory access) addressed to local memory in nano seconds", | ||
"MetricExpr": "1e9 * (cha@UNC_CHA_TOR_OCCUPANCY.IA_MISS\\,config1\\=0x40432@ / cha@UNC_CHA_TOR_INSERTS.IA_MISS\\,config1\\=0x40432@) / (UNC_CHA_CLOCKTICKS / (source_count(UNC_CHA_CLOCKTICKS) * #num_packages)) * duration_time", | ||
"MetricExpr": "llc_data_read_demand_plus_prefetch_miss_latency_for_remote_requests", | ||
"MetricName": "llc_data_read_demand_plus_prefetch_miss_latency_for_local_requests", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the same issue as on skylakex which is fixed in a later commit - local requests metric is aliased to remote requests. Could we make the fix on all architectures and squash the fix commit into this one so that we don't need to spot bugs then check for later fixes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@captain5050, this error turns out to be an issue caused by the mixed commits in this PR. @edwarddavidbaker just helped me updated this PR to one squashed commit. Hopefully, this makes it cleaner and easier to review.
5803bf7
to
3990a25
Compare
@captain5050, I put the files into perf code and ran the perf tests. It passed most of the tests including |
We should be good. IIRC those metrics aren't using json events and so this points to a sysfs issue on your test machine. |
@@ -117,7 +117,7 @@ | |||
"MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.ALLOC_RESTRICTIONS@ / tma_info_core_slots", | |||
"MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group", | |||
"MetricName": "tma_alloc_restriction", | |||
"MetricThreshold": "tma_alloc_restriction > 0.1", | |||
"MetricThreshold": "tma_alloc_restriction > 0.1 & (tma_resource_bound > 0.2 & tma_backend_bound_aux > 0.2)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note that there are a lot of improved thresholds but we may get multiplexing issues because of the greater number of metrics/events.
@@ -684,28 +712,28 @@ | |||
"PublicDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative branch misprediction (retired JEClear). Related metrics: tma_branch_mispredicts, tma_info_bottleneck_mispredictions, tma_mispredicts_resteers" | |||
}, | |||
{ | |||
"BriefDescription": "Instructions per retired mispredicts for conditional non-taken branches (lower number means higher occurrence rate).", | |||
"BriefDescription": "Instructions per retired Mispredicts for conditional non-taken branches (lower number means higher occurrence rate).", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, the case change seems unnecessary/worse leading to a large number of changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some seemingly unnecessary case changes, and changes for metric groups where names like BvML aren't explained. That's caused from the source spreadsheet. The conversion looks good.
3990a25
to
4d86844
Compare
There is a set of new metricgroup added in this release. They map to the related bottleneck metrics like below: BvMP: tma_info_bottleneck_mispredictions BvBC: tma_info_bottleneck_big_code BvFB: tma_info_bottleneck_instruction_fetch_bandwidth BvMB: tma_info_bottleneck_cache_memory_bandwidth BvML: tma_info_bottleneck_cache_memory_latency BvMT: tma_info_bottleneck_memory_data_tlbs BvMS: tma_info_bottleneck_memory_synchronization BvCB: tma_info_bottleneck_compute_bound_est BvIO: tma_info_bottleneck_irregular_overhead BvOB: tma_info_bottleneck_other_bottlenecks BvBO: tma_info_bottleneck_branching_overhead BvUW: tma_info_bottleneck_useful_work
4d86844
to
59194d4
Compare
@captain5050, we've updated commit message to add explanations on the new metric groups and revert the case changes. Please take a look at the updates. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ian and Weilin!
Changes included in this PR:
Counter
andExperimental
, from PR: Update field names in counter.json file #165 and PR: Add experimental field to perf json #170counter.json
files, from PR: Update field names in counter.json file #165