Skip to content

Latest commit

 

History

History
27 lines (20 loc) · 1.61 KB

File metadata and controls

27 lines (20 loc) · 1.61 KB

FMS Acceleration for Accelerated PeFT Techniques

Currently only supports LoRA-related techniques, but more are in the pipeline to be added:

Plugins

Plugin Description Depends Loading Augmentation Callbacks
autogptq Loads 4bit GPTQ-LoRA with quantized GPTQ as base AutoGPTQ
bnb Loads 4bit QLoRA with quantized bitsandbytes Linear4 Huggingface
bitsandbytes

Key Points

  • fix upcasting (resulting in slowdown) issue for bnb plugin, originally discovered by inventors of Unsloth.
  • bnb properly configured to work with FSDP following this guide.
  • triton_v2 kernels are not yet properly integrated into huggingface optimum.
  • triton_v2 kernels are the only 4bit kernels that work for training.

Known Issues

  • Models with sliding windows (e.g., Mistral, Mixtral) will have memory and throughout issues.
  • GPTQ-LORA sometimes observed to have nan grad norms in the begining of training, but training proceeds well otherwise.
  • low_cpu_mem_usage temporarily disabled for AutoGPTQ until bug with make_sure_no_tensor_in_meta_device is resolved.
  • Requires nightly AutoGPTQ until package > 0.7.1 becomes available
    pip install git+https://github.com/AutoGPTQ/AutoGPTQ.git