Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add max and abs to selective AC config (#701)
Summary: We usually want to save results of max(abs(tensor)) as the memory used to store the result will be negligible. For float8 training with selective op-based AC, this is a nice small speedup of 1% wps gain on LLaMa 3 8B pretraining on 8 H100 GPUs with default settings. There is no harm to keeping it here for non-float8 training, so just enabling for everyone with gating to keep things simple. WPS: 6800 -> 6860 Test Plan: ``` with-proxy CONFIG_FILE="./train_configs/llama3_8b.toml" ./run_llama_train.sh --float8.enable_float8_linear --float8.scaling_type_input dynamic --float8.scaling_type_weight dynamic --float8.scaling_type_grad_output dynamic --training.compile ``` Reviewers: Subscribers: Tasks: Tags:
- Loading branch information