Currently only supports LoRA-related techniques, but more are in the pipeline to be added:
Plugin | Description | Depends | Loading | Augmentation | Callbacks |
---|---|---|---|---|---|
autogptq | Loads 4bit GPTQ-LoRA with quantized GPTQ as base | AutoGPTQ | ✅ | ✅ | |
bnb | Loads 4bit QLoRA with quantized bitsandbytes Linear4 | Huggingface bitsandbytes |
✅ | ✅ |
- fix upcasting (resulting in slowdown) issue for
bnb
plugin, originally discovered by inventors of Unsloth. bnb
properly configured to work with FSDP following this guide.triton_v2
kernels are not yet properly integrated into huggingface optimum.triton_v2
kernels are the only 4bit kernels that work for training.
- Models with sliding windows (e.g., Mistral, Mixtral) will have memory and throughout issues.
- GPTQ-LORA sometimes observed to have
nan
grad norms in the begining of training, but training proceeds well otherwise. low_cpu_mem_usage
temporarily disabled for AutoGPTQ until bug withmake_sure_no_tensor_in_meta_device
is resolved.- Requires nightly AutoGPTQ until package
> 0.7.1
becomes availablepip install git+https://github.com/AutoGPTQ/AutoGPTQ.git