Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PP InterleavedZeroBubble schedule shows low TPS and high memory usage #773

Open
tianyu-l opened this issue Jan 3, 2025 · 2 comments
Open
Assignees
Labels
bug Something isn't working release_blocking Issues that are blocking the milestone / release completion

Comments

@tianyu-l
Copy link
Contributor

tianyu-l commented Jan 3, 2025

Here are benchmark results on on 512-GPU experiments
"1F1B": memory: 82.46GiB(86.80%) tps: 100 mfu: 26.52%
"Interleaved1F1B": memory: 72.69GiB(76.52%) tps: 128 mfu: 33.88%
"Interleaved Zero Bubble" : memory: 89.73GiB(94.45%) tps: 13 mfu: 3.41%

Note that for Zero Bubble, no torch.compile / async TP / Float8 is used since not compatible.

@tianyu-l tianyu-l added bug Something isn't working release_blocking Issues that are blocking the milestone / release completion labels Jan 3, 2025
@tianyu-l tianyu-l added this to the torchtitan v1.0.0 release milestone Jan 3, 2025
@hhaAndroid
Copy link

me too

@H-Huang H-Huang self-assigned this Jan 7, 2025
@H-Huang
Copy link
Member

H-Huang commented Jan 13, 2025

Update: looks like there is slower TPS when used with AC, but without AC the TPS is fine.

Llama3 8B
batch_size = 16
seq_len = 512
pipeline_parallel_degree = 8

interleaved 1f1b
[rank0]:2025-01-10 16:07:06,580 - root - INFO - step: 4 loss: -1.0000 memory: 22.74GiB(28.73%) tps: 304 mfu: 4.16%
[rank7]:2025-01-10 16:07:06,844 - root - INFO - step: 4 loss: 12.1754 memory: 17.97GiB(22.71%) tps: 304 mfu: 4.16%

PP8, interleaved zero bubble
[rank0]:2025-01-10 15:52:35,281 - root - INFO - step: 4 loss: -1.0000 memory: 20.57GiB(25.99%) tps: 339 mfu: 4.65%
[rank7]:2025-01-10 15:52:35,548 - root - INFO - step: 4 loss: 12.1761 memory: 22.23GiB(28.09%) tps: 339 mfu: 4.65%

With full AC
PP8, interleaved 1f1b
[rank0]:2025-01-10 16:08:50,079 - root - INFO - step: 4 loss: -1.0000 memory: 15.86GiB(20.04%) tps: 243 mfu: 3.33%
[rank7]:2025-01-10 16:08:50,352 - root - INFO - step: 4 loss: 12.1765 memory: 15.72GiB(19.87%) tps: 243 mfu: 3.33%

PP8, interleaved zero bubble
[rank0]:2025-01-10 16:10:23,054 - root - INFO - step: 4 loss: -1.0000 memory: 16.82GiB(21.25%) tps: 100 mfu: 1.37%
[rank7]:2025-01-10 16:10:24,381 - root - INFO - step: 4 loss: 12.1760 memory: 18.64GiB(23.55%) tps: 100 mfu: 1.37%

Still investigating the reason

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working release_blocking Issues that are blocking the milestone / release completion
Projects
None yet
Development

No branches or pull requests

3 participants