SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models

This repository contains the codes of experiments of the paper SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models.

The rapid advancement in large language models (LLMs) comes with a significant increase in their parameter size, presenting challenges for adaptation and fine-tuning. Parameter-efficient fine-tuning (PEFT) methods are widely used to adapt LLMs for downstream tasks efficiently. In this paper, we propose Singular Values and Orthonormal Regularized Singular Vectors Adaptation, or SORSA, a novel PEFT method. Each SORSA adapter consists of two main parts: trainable principal singular weights $W_p = U_p \text{diag}(S_p) V^\top_p$, and frozen residual weights $W_r = U_r \text{diag}(S_r) V^\top_r$. These parts are initialized by performing SVD on pre-trained weights. Moreover, we implement and analyze an orthonormal regularizer. SORSA adapters could be merged during inference, thus eliminating any inference latency.

Empirical Experiments

Reproduce the Experiments

First, install sorsa package from pip:

pip install sorsa

Then, create .env file in the root directory of the project and add your Hugging Face Access Token:

hf=Your_Hugging_Face_Access_Token

Llama 2 7B, Mistral v0.1 7B and Gemma 7B

First, install the packages via anaconda

conda env create -f environment.yml

Run scripts from ./scripts/train_sorsa.sh to train the model.

After training, run the ./scripts/merge_sorsa.sh to merge the adapter to the base model:

Run following command to evaluate on GSM-8K:

python3 run.py --name llama2_sorsa_r128 \
  --test \
  --test-dataset gsm-8k \
  --test-precision bf16

Run following command to evaluate on MATH:

python3 run.py --name llama2_sorsa_r128 \
  --test \
  --test-dataset math \
  --test-precision bf16

Run following command to evaluate on HumanEval:

python3 run.py --name llama2_sorsa_r128 \
  --test \
  --test-dataset humaneval \
  --test-precision bf16

RWKV6

If you are training, merging or testing RWKV6 model, please add --rwkv flag to run.py.

Cite the work

You could cite the work by using the BibTeX code as follows:

@article{cao2024sorsa,
  title={SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models},
  author={Cao, Yang},
  journal={arXiv preprint arXiv:2409.00055},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models

Empirical Experiments

Reproduce the Experiments

Llama 2 7B, Mistral v0.1 7B and Gemma 7B

RWKV6

Cite the work

Files

README.md

Latest commit

History

README.md

File metadata and controls

SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models

Empirical Experiments

Reproduce the Experiments

Llama 2 7B, Mistral v0.1 7B and Gemma 7B

RWKV6

Cite the work