Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Why conflict with sequence parallel? #55

Open
jiurizz opened this issue Jan 19, 2025 · 0 comments
Open

[QUESTION] Why conflict with sequence parallel? #55

jiurizz opened this issue Jan 19, 2025 · 0 comments

Comments

@jiurizz
Copy link

jiurizz commented Jan 19, 2025

Hi, I have 2 questions about confict with sequence parallel and looking forward to getting a reply, thank you!

  1. Commit 221a Support sequence-parallel for zero bubble schedule #14 fixed the conflict, but I still want to confirm again. The reason why the code conflicts with sequence parallel is because under sequence parallel, the backward of RowParallelLinear uses the global buffer for all gather and the buffer is likely to be refreshed by other data?
  2. If so, why commit 221a Support sequence-parallel for zero bubble schedule #14 also all_gather the grad_output independently? Grad_output is gathered in a tensor created by torch.empty(), not from global buffer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant