FMS Acceleration

This monorepo collects libraries of packages that accelerate fine-tuning / training of large models, intended to be part of the fms-hf-tuning suite.

This package is in BETA under extensive development. Expect breaking changes!

Plugins

Plugin	Description	Depends	License	Status
framework	This acceleration framework for integration with huggingface trainers			Beta
accelerated-peft	For PEFT-training, e.g., 4bit QLoRA.	Huggingface AutoGPTQ	Apache 2.0 MIT	Beta
TBA	Unsloth-inspired. Fused LoRA and triton kernels (e.g., fast cross-entropy, rms, rope)	Xformers	Apache 2.0 with exclusions.	Under Development
TBA	MegaBlocks inspired triton Kernels and acclerations for Mixture-of-Expert models		Apache 2.0	Under Development

Usage with FMS HF Tuning

This is intended to be a collection of many acceleration routines (including accelerated peft and other techniques). Below demonstrates a concrete example to show how to accelerate your tuning experience with tuning/sft_trainer.py from fms-hf-tuning.

Example: Accelerated GPTQ-LoRA Training

Below instructions for accelerated peft fine-tuning. In particular GPTQ-LoRA tuning with the AutoGPTQ triton_v2 kernel; this kernel is state-of-the-art provided by jeromeku on Mar 2024:

Checkout fms-hf-tuning and install the framework library:
```
$ pip install -e .[fms-accel]
```
or alternatively install the framework directly:
```
$ pip install git+https://github.com/foundation-model-stack/fms-acceleration.git#subdirectory=plugins/framework
```
The above installs the command line utility fms_acceleration.cli, which can then be used to install plugins and view sample configurations.
Prepare a YAML configuration for the acceleration framework plugins. To help with this, fms_acceleration.cli provides a configs utility to search for sample configs by entering the following:
```
$ python -m fms_acceleration.cli configs

1. accelerated-peft-autogptq (accelerated-peft-autogptq-sample-configuration.yaml) - plugins: ['accelerated-peft']
2. accelerated-peft-bnb (accelerated-peft-bnb-nf4-sample-configuration.yaml) - plugins: ['accelerated-peft']
```
or alternatively search the configurations manually:
- Full sample configuration list shows the plugins required for the configs.
- E.g., Accelerated GPTQ-LoRA configuration here.

Install the required plugins. Use list to view available plugins; this list updates as more plugins get developed. Recall that configs list the required plugins for the sample configurations; make sure all of them are installed.

$ python -m fms_acceleration.cli plugins

Choose from the list of plugin shortnames, and do:
* 'python -m fms_acceleration.cli install <pip-install-flags> PLUGIN_NAME'.

List of PLUGIN_NAME [PLUGIN_SHORTNAME]:

1. fms_acceleration_peft [peft]

and then install the plugin. We install the fms-acceleration-peft plugin for GPTQ-LoRA tuning with triton v2 as:

python -m fms_acceleration.cli install fms_acceleration_peft

The above is the equivalent of:

pip install git+https://github.com/foundation-model-stack/fms-acceleration.git#subdirectory=plugins/accelerated-peft

Run sft_trainer.py while providing the correct arguments:
- --acceleration_framework_config_file pointing to framework configuration YAML. The framework activates relevant plugins given the framework configuration; for more details see framework/README.md.
- arguments required for correct operation (e.g., if using accelerated peft, then peft_method is required).
  - Use arguments along with the sample configuration shortname to display the relevant critical arguments; these arguments can be manually referred from scenarios.yaml:
```
$ python -m fms_acceleration.cli arguments accelerated-peft-autogptq

Searching for configuration shortnames: ['accelerated-peft-autogptq']
1. scenario: accelerated-peft-gptq
configs: accelerated-peft-autogptq
arguments:
    --learning_rate 2e-4 \
    --fp16 True \
    --torch_dtype float16 \
    --peft_method lora \
    --r 16 \
    --lora_alpha 16 \
    --lora_dropout 0.0 \
    --target_modules ['q_proj', 'k_proj', 'v_proj', 'o_proj']
```
- More info on defaults.yaml and scenarios.yaml found here.
  - Arguments not critical to the plugins found in defaults.yaml. These can be taken with liberty.
  - Arguments critcal to plugins found in scenarios.yaml. The relevant section of scenarios.yaml, is the one whose framework_config entries, match the shortname of the sample configuration of interest.

Run sft_trainer.py providing the acceleration configuration and arguments:

# when using sample-configurations, arguments can be referred from
# defaults.yaml and scenarios.yaml
python sft_trainer.py \
    --acceleration_framework_config_file framework.yaml \
    ...  # arguments

Activate TRANSFORMERS_VERBOSITY=info to see the huggingface trainer printouts and verify that AccelerationFramework is activated!

# this printout will be seen in huggingface trainer logs if acceleration is activated
***** FMS AccelerationFramework *****
Active Plugin: AutoGPTQAccelerationPlugin. Python package: fms_acceleration_peft. Version: 0.0.1.
***** Running training *****
Num examples = 1,549
Num Epochs = 1
Instantaneous batch size per device = 4
Total train batch size (w. parallel, distributed & accumulation) = 4
Gradient Accumulation steps = 1
Total optimization steps = 200
Number of trainable parameters = 13,631,488

Over time, more plugins will be updated, so please check here for the latest accelerations!.

CUDA Dependencies

This repo requires CUDA to compute the kernels, and it is convinient to use NVidia Pytorch Containers that already comets with CUDA installed. We have tested with the following versions:

pytorch:24.03-py3

Benchmarks

The benchmarks can be reproduced with the provided scripts.

includes baseline benches (e.g., standard fine-tuning, standard peft).
benches for various acceleration sample configs.

See below CSV files for various results:

A100-80GB.
L40-40GB.

Code Architecture

For deeper dive into details see framework/README.md.

Maintainers

IBM Research, Singapore

Fabian Lim [email protected]
Aaron Chew [email protected]
Laura Wynter [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
plugins		plugins
sample-configurations		sample-configurations
scripts		scripts
.gitignore		.gitignore
.isort.cfg		.isort.cfg
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md
setup_requirements.txt		setup_requirements.txt
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FMS Acceleration

Plugins

Usage with FMS HF Tuning

Example: Accelerated GPTQ-LoRA Training

CUDA Dependencies

Benchmarks

Code Architecture

Maintainers

About

Releases

Packages

Languages

License

fabianlim/fms-acceleration

Folders and files

Latest commit

History

Repository files navigation

FMS Acceleration

Plugins

Usage with FMS HF Tuning

Example: Accelerated GPTQ-LoRA Training

CUDA Dependencies

Benchmarks

Code Architecture

Maintainers

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages