Changes to enable fp8 on multi devices #149

y-sq · 2023-11-20T21:03:54Z

If the model is casted to bf16 (model = model.to(get_torch_dtype(dtype))), dtype = bf16 is also passed to the scale_a parameter of the float8 tensor, and caused

output, output_amax = torch._scaled_mm(
RuntimeError: scale_a must be float scalar

The float8Linear classes used in multi-gpu are Float8Column/RowParallelLinear, sync_float8_amax_and_scale_history needs to identify these class types.

facebook-github-bot · 2023-11-20T21:36:20Z

@y-sq has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

drisspg · 2023-11-21T17:27:34Z

float8_experimental/float8_linear_utils.py


    for name, child in model.named_modules():
-        if not isinstance(child, (Float8Linear)):
+        if not any(isinstance(child, a) for a in fp8_classes):


Since we have removed the NoTs class I think this is likely rebase buggies

In multi-gpu cases, we have Float8ColumnParallelLinear and Float8RowParallelLinear (which have dependencies of external distributed training code) as the fp8 classes. So I modified here to pass the class types to sync_float8_amax_and_scale_history.

ahhh I see I was misreading this, but if we make fp8_classes a Tuple[types] couldn't we still keep the check as is, this is a nit anyways both accomplish the same thing

drisspg · 2023-11-21T17:28:02Z

Looks great, can we add a test?

y-sq · 2023-11-21T20:10:41Z

It's strange that format check passed on my side but failed on github checks..

ufmt --version
ufmt, version 2.3.0
ufmt check .
✨ 22 files already formatted ✨

drisspg · 2023-11-21T20:31:11Z

Dont worry about the format, there is deviation between ufmt and internal, and I can't pin it down, so those were not likely caused by anything you did

drisspg · 2023-11-21T20:32:52Z

test/test_base.py

+        # Cast the module to dtype
+        m = m.to(dtype=linear_dtype)
+
+        # autocast off


Nit: this test does cover it but also could we assert the buffer types are still fp32

facebook-github-bot · 2023-11-21T21:37:54Z

@y-sq has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

drisspg

Thanks 😄 !

facebook-github-bot · 2023-11-21T23:20:11Z

@y-sq merged this pull request in 77386ba.

Changes to enable fp8 in xlformers

862c1e9

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 20, 2023

format

e70b4ed

drisspg changed the title ~~Changes to enable fp8 in xlformers~~ Changes to enable fp8 Nov 20, 2023

drisspg changed the title ~~Changes to enable fp8~~ Changes to enable fp8 on multi devices Nov 20, 2023

y-sq marked this pull request as ready for review November 20, 2023 21:35

drisspg reviewed Nov 21, 2023

View reviewed changes

y-sq added 2 commits November 21, 2023 11:59

add test cases for type cast

1e41dc8

format

bb7a68b

drisspg approved these changes Nov 21, 2023

View reviewed changes

update

9310b48

drisspg approved these changes Nov 21, 2023

View reviewed changes

facebook-github-bot closed this in 77386ba Nov 21, 2023

facebook-github-bot added the Merged label Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes to enable fp8 on multi devices #149

Changes to enable fp8 on multi devices #149

y-sq commented Nov 20, 2023

facebook-github-bot commented Nov 20, 2023

drisspg Nov 21, 2023

y-sq Nov 21, 2023

drisspg Nov 21, 2023

drisspg commented Nov 21, 2023

y-sq commented Nov 21, 2023

drisspg commented Nov 21, 2023

drisspg Nov 21, 2023

facebook-github-bot commented Nov 21, 2023

drisspg left a comment

facebook-github-bot commented Nov 21, 2023

Changes to enable fp8 on multi devices #149

Changes to enable fp8 on multi devices #149

Conversation

y-sq commented Nov 20, 2023

facebook-github-bot commented Nov 20, 2023

drisspg Nov 21, 2023

Choose a reason for hiding this comment

y-sq Nov 21, 2023

Choose a reason for hiding this comment

drisspg Nov 21, 2023

Choose a reason for hiding this comment

drisspg commented Nov 21, 2023

y-sq commented Nov 21, 2023

drisspg commented Nov 21, 2023

drisspg Nov 21, 2023

Choose a reason for hiding this comment

facebook-github-bot commented Nov 21, 2023

drisspg left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Nov 21, 2023