-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] SDP syncing buffers during gradient accumulation #1049
Conversation
cc @min-xu-ai |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
I've not followed the CI changes on Fairscale @min-xu-ai, looks like the breakages are not related (but they do touch OSS) but are pytorch version dependent ? |
I trigged a rerun. The cpu test failure seems unrelated. But the GPU test failure seems to be in OSS?
|
it is interesting, all the failures seem to be the same. What the best way for me to run your branch? Do I git remote add your repo & branch? |
These OSS tests are failing in the main branch (and in other unrelated PRs too). Agree that it's probably related to recent pytorch upgrade as I don't remember them failing last week. I'm seeing all failures from here in a FSDP PR (#1052), including the CPU and the "test_parity3d_checkpoint_syncbn" ones. |
somehow this PR hitting unrelated test errors. I am replacing it with: #1075 Thanks Ben! |
What does this PR do?
Fixes #1041. I just had a minute or two, hoping that it's enough :)
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.