PP InterleavedZeroBubble schedule shows low TPS and high memory usage #773
Labels
bug
Something isn't working
release_blocking
Issues that are blocking the milestone / release completion
Milestone
Here are benchmark results on on 512-GPU experiments
"1F1B": memory: 82.46GiB(86.80%) tps: 100 mfu: 26.52%
"Interleaved1F1B": memory: 72.69GiB(76.52%) tps: 128 mfu: 33.88%
"Interleaved Zero Bubble" : memory: 89.73GiB(94.45%) tps: 13 mfu: 3.41%
Note that for Zero Bubble, no torch.compile / async TP / Float8 is used since not compatible.
The text was updated successfully, but these errors were encountered: