Expose mixed_precision dtype arguments #348

wconstab · 2024-05-20T18:17:26Z

Stack from ghstack (oldest at bottom):

add training.mixed_precision_param and .mixed_precision_reduce options

refactor a util to map strings to torch dtypes

[ghstack-poisoned]

torchtitan/parallelisms/parallelize_llama.py

[ghstack-poisoned]

kwen2501

Thanks for adding this.

kwen2501 · 2024-05-20T22:34:28Z

torchtitan/config_manager.py

+TORCH_DTYPE_ARGS = [
+    "checkpoint.export_dtype",
+    "training.mixed_precision_param",
+    "training.mixed_precision_reduce",


nit: should "reduce" be "grad"?

no, im following the existing naming for the mixed_precision config struct

kwen2501 · 2024-05-20T22:36:38Z

torchtitan/config_manager.py

+                torch dtype to use for reductions when applying mixed precision via FSDP.
+                This feature only takes effect when data_parallel_degree > 1


nit: "reductions" -> "gradients"

kwen2501 · 2024-05-20T22:37:42Z

torchtitan/config_manager.py

+                        for k_, v_ in v.items():
+                            if ".".join([k, k_]) in TORCH_DTYPE_ARGS:
+                                v[k_] = torch_dtype(v_)


nit: comment please?

[ghstack-poisoned]

wanchaol

lgtm

add training.mixed_precision_param and .mixed_precision_reduce options refactor a util to map strings to torch dtypes ghstack-source-id: 387e1ca13ad23e859d21d7760f858ee6e269a796 Pull Request resolved: #348

add training.mixed_precision_param and .mixed_precision_reduce options refactor a util to map strings to torch dtypes ghstack-source-id: 387e1ca13ad23e859d21d7760f858ee6e269a796 Pull Request resolved: pytorch#348

add training.mixed_precision_param and .mixed_precision_reduce options refactor a util to map strings to torch dtypes ghstack-source-id: 387e1ca13ad23e859d21d7760f858ee6e269a796 Pull Request resolved: #348

add training.mixed_precision_param and .mixed_precision_reduce options refactor a util to map strings to torch dtypes ghstack-source-id: 387e1ca13ad23e859d21d7760f858ee6e269a796 Pull Request resolved: pytorch#348

Update

88fd383

[ghstack-poisoned]

This was referenced May 20, 2024

Add Pipeline Parallel (and 2D PP+FSDP) support #318

Merged

Add 3D support #344

Merged

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 20, 2024

wconstab mentioned this pull request May 20, 2024

Add a 3-stage PP config #345

Closed

tianyu-l reviewed May 20, 2024

View reviewed changes

torchtitan/parallelisms/parallelize_llama.py Outdated Show resolved Hide resolved

Update

5944792

[ghstack-poisoned]

kwen2501 approved these changes May 20, 2024

View reviewed changes

Update

a4f1d9d

[ghstack-poisoned]

wanchaol approved these changes May 21, 2024

View reviewed changes

wconstab merged commit a4f1d9d into gh/wconstab/27/base May 21, 2024
4 checks passed

wconstab deleted the gh/wconstab/27/head branch May 21, 2024 01:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose mixed_precision dtype arguments #348

Expose mixed_precision dtype arguments #348

wconstab commented May 20, 2024 •

edited

Loading

kwen2501 left a comment

kwen2501 May 20, 2024

wconstab May 20, 2024

kwen2501 May 20, 2024

kwen2501 May 20, 2024

wanchaol left a comment

		torch dtype to use for reductions when applying mixed precision via FSDP.
		This feature only takes effect when data_parallel_degree > 1

Expose mixed_precision dtype arguments #348

Expose mixed_precision dtype arguments #348

Conversation

wconstab commented May 20, 2024 • edited Loading

kwen2501 left a comment

Choose a reason for hiding this comment

kwen2501 May 20, 2024

Choose a reason for hiding this comment

wconstab May 20, 2024

Choose a reason for hiding this comment

kwen2501 May 20, 2024

Choose a reason for hiding this comment

kwen2501 May 20, 2024

Choose a reason for hiding this comment

wanchaol left a comment

Choose a reason for hiding this comment

wconstab commented May 20, 2024 •

edited

Loading