This monorepo collects libraries of packages that accelerate fine-tuning / training of large models, intended to be part of the fms-hf-tuning suite.
This package is in BETA under extensive development. Expect breaking changes!
Plugin | Description | Depends | License | Status |
---|---|---|---|---|
framework | This acceleration framework for integration with huggingface trainers | Beta | ||
accelerated-peft | For PEFT-training, e.g., 4bit QLoRA. | Huggingface AutoGPTQ |
Apache 2.0 MIT |
Beta |
TBA | Unsloth-inspired. Fused LoRA and triton kernels (e.g., fast cross-entropy, rms, rope) | Xformers | Apache 2.0 with exclusions. | Under Development |
TBA | MegaBlocks inspired triton Kernels and acclerations for Mixture-of-Expert models | Apache 2.0 | Under Development |
This is intended to be a collection of many acceleration routines (including accelerated peft and other techniques). Below demonstrates a concrete example to show how to accelerate your tuning experience with tuning/sft_trainer.py from fms-hf-tuning
.
Below instructions for accelerated peft fine-tuning. In particular GPTQ-LoRA tuning with the AutoGPTQ triton_v2
kernel; this kernel is state-of-the-art provided by jeromeku
on Mar 2024:
-
Checkout fms-hf-tuning and install the framework library:
$ pip install -e .[fms-accel]
or alternatively install the framework directly:
$ pip install git+https://github.com/foundation-model-stack/fms-acceleration.git#subdirectory=plugins/framework
The above installs the command line utility
fms_acceleration.cli
, which can then be used to install plugins and view sample configurations. -
Prepare a YAML configuration for the acceleration framework plugins. To help with this,
fms_acceleration.cli
provides aconfigs
utility to search for sample configs by entering the following:$ python -m fms_acceleration.cli configs 1. accelerated-peft-autogptq (accelerated-peft-autogptq-sample-configuration.yaml) - plugins: ['accelerated-peft'] 2. accelerated-peft-bnb (accelerated-peft-bnb-nf4-sample-configuration.yaml) - plugins: ['accelerated-peft']
or alternatively search the configurations manually:
- Full sample configuration list shows the
plugins
required for the configs. - E.g., Accelerated GPTQ-LoRA configuration here.
- Full sample configuration list shows the
-
Install the required
plugins
. Uselist
to view available plugins; this list updates as more plugins get developed. Recall thatconfigs
list the requiredplugins
for the sample configurations; make sure all of them are installed.$ python -m fms_acceleration.cli plugins Choose from the list of plugin shortnames, and do: * 'python -m fms_acceleration.cli install <pip-install-flags> PLUGIN_NAME'. List of PLUGIN_NAME [PLUGIN_SHORTNAME]: 1. fms_acceleration_peft [peft]
and then
install
the plugin. We install thefms-acceleration-peft
plugin for GPTQ-LoRA tuning with triton v2 as:python -m fms_acceleration.cli install fms_acceleration_peft
The above is the equivalent of:
pip install git+https://github.com/foundation-model-stack/fms-acceleration.git#subdirectory=plugins/accelerated-peft
-
Run
sft_trainer.py
while providing the correct arguments:-
--acceleration_framework_config_file
pointing to framework configuration YAML. The framework activates relevant plugins given the framework configuration; for more details see framework/README.md. -
arguments required for correct operation (e.g., if using accelerated peft, then
peft_method
is required).- Use
arguments
along with the sample configurationshortname
to display the relevant critical arguments; these arguments can be manually referred from scenarios.yaml:
$ python -m fms_acceleration.cli arguments accelerated-peft-autogptq Searching for configuration shortnames: ['accelerated-peft-autogptq'] 1. scenario: accelerated-peft-gptq configs: accelerated-peft-autogptq arguments: --learning_rate 2e-4 \ --fp16 True \ --torch_dtype float16 \ --peft_method lora \ --r 16 \ --lora_alpha 16 \ --lora_dropout 0.0 \ --target_modules ['q_proj', 'k_proj', 'v_proj', 'o_proj']
- Use
-
More info on
defaults.yaml
andscenarios.yaml
found here.- Arguments not critical to the plugins found in defaults.yaml. These can be taken with liberty.
- Arguments critcal to plugins found in scenarios.yaml. The relevant section of scenarios.yaml, is the one whose
framework_config
entries, match theshortname
of the sample configuration of interest.
-
-
Run
sft_trainer.py
providing the acceleration configuration and arguments:# when using sample-configurations, arguments can be referred from # defaults.yaml and scenarios.yaml python sft_trainer.py \ --acceleration_framework_config_file framework.yaml \ ... # arguments
Activate
TRANSFORMERS_VERBOSITY=info
to see the huggingface trainer printouts and verify thatAccelerationFramework
is activated!# this printout will be seen in huggingface trainer logs if acceleration is activated ***** FMS AccelerationFramework ***** Active Plugin: AutoGPTQAccelerationPlugin. Python package: fms_acceleration_peft. Version: 0.0.1. ***** Running training ***** Num examples = 1,549 Num Epochs = 1 Instantaneous batch size per device = 4 Total train batch size (w. parallel, distributed & accumulation) = 4 Gradient Accumulation steps = 1 Total optimization steps = 200 Number of trainable parameters = 13,631,488
Over time, more plugins will be updated, so please check here for the latest accelerations!.
This repo requires CUDA to compute the kernels, and it is convinient to use NVidia Pytorch Containers that already comets with CUDA installed. We have tested with the following versions:
pytorch:24.03-py3
The benchmarks can be reproduced with the provided scripts.
- includes baseline benches (e.g., standard fine-tuning, standard peft).
- benches for various acceleration sample configs.
See below CSV files for various results:
For deeper dive into details see framework/README.md.
IBM Research, Singapore
- Fabian Lim [email protected]
- Aaron Chew [email protected]
- Laura Wynter [email protected]