-
Notifications
You must be signed in to change notification settings - Fork 23.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
partitioner: avoid inserting duplicates into heap #145082
base: gh/bdhirsh/637/base
Are you sure you want to change the base?
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145082
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 1 Cancelled Job, 3 Unrelated FailuresAs of commit f38f555 with merge base 727ae13 (): NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: d8f90b1650ad8c3abd3ddc6874bf160023aa5f71 Pull Request resolved: #145082
This PR needs a
|
] | ||
|
||
for benchmark in all: | ||
benchmark.enable_compile_time_instruction_count().collect_all().append_results( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested locally and this PR cuts the instruction-count of this microbenchmark from 61B -> 4B instructions (it grows a lot larger if you increase len(tmps)
from 16 to 32)
f(self.x) | ||
|
||
|
||
def main(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @laithsakka - is there anything else I need to do to ensure this new compile time benchmark runs in CI? (do I need to update one of the expected-instruction-count files locally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replied privately but posting here for others
(1) when you land this diff the benchmark will run, but it wont fail at regression .
(2) you have two benchmarks with the same name you probably want to rename it.
(3) to enable failing at regressions you need to add a line to /data/users/lsakka/fbsource/fbcode/caffe2/benchmarks/dynamo/pr_time_benchmarks/expected_results.csv you can get the results from the logs https://github.com/pytorch/pytorch/actions/runs/12832763265/job/35794624737
(4) usually i do 3 in different step, first land the diff that enable running the benchmark, then monitor it on https://fburl.com/unidash/vblyya4c to make sure its stable and not noisy for day or two then update the expected results file above .
Fixes #145081
This looks like it was a source of quadratic compile times in the torchtitan CP graphs. There's some code in the partitioner that iteratively adds users of a node to a heap, and pops the earliest user. If you have long parallel chains of fusible ops that all eventually feed into some shared ops, then this can result in:
(1) a node getting added to the heap many times
(2) each time we pop that node, we add (duplicates of) each of that node users to the heap
(3) repeat with each user
Stack from ghstack (oldest at bottom):
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames