MagicPIG: LSH Sampling for Efficient LLM Generation

installation

bash install.sh

Only Intel CPUs are supported now. We also provide a huggingface-like implementation for accuracy evaluation, which does not need Intel CPUs.

Experiments

cd RULER/RULER/scripts
export K=10 # LSH hyper-parameter for MagicPIG and Page Size for Quest
export L=150 # LSH hyper-parameter for MagicPIG and number of selected pages for Quest
export sink=4 # sink token
export local=64 # local token
export model=0 # 0: MagicPIG; 1: Quest; 2: TopK 3: Oracle Sampling
export expid=0
bash run.sh llama3-8b-chat-128k synthetic $K $L $sink $local $model $expid

This script is implemented in huggingface to replicate accuracy results (for RULER benchmark). The reference file (for model and KV cache implementation) can be found in refs/.

Three models are supported now: llama3-8b-chat-128k (Llama-3.1-8B-Instruct), llama3-70b-chat-128k (Llama-3.1-70B-Instruct), mistral-7b-chat-512k ( MegaBeam-Mistral-7B-512k).

In models/, we implement MagicPIG CPU/GPU codes for sanity checks and benchmarking. models/magicpig_llama.py and models/cache.py are expected to be equivalent to refs/hf_model_ref.py and refs/hf_cache_ref.py.

To benchmark the speed of MagicPIG

cd models
OMP_NUM_THREADS=96 python benchmark.py --P 98000 --M 98304 --B 1 --model meta-llama/Meta-Llama-3.1-8B-Instruct

To achieve the best performance, currently you need to manually set the omp threads in lsh/lsh.cc and attention/gather_gemv.cc (as well as here) to match the number of physical cores in your CPUs.

For generation purposes,

cd models
python generation.py --path ../data/data32k.json

where path specifies the input contexts.

models/magicpig_config.json adjusts proper hyper-parameters such as (K, L) in LSH algorithms and which layer to keep in GPUs.

@article{chen2024magicpig,
  title={MagicPIG: LSH Sampling for Efficient LLM Generation},
  author={Chen, Zhuoming and Sadhukhan, Ranajoy and Ye, Zihao and Zhou, Yang and Zhang, Jianyu and Nolte, Niklas and Tian, Yuandong and Douze, Matthijs and Bottou, Leon and Jia, Zhihao and others},
  journal={arXiv preprint arXiv:2410.16179},
  year={2024}
}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MagicPIG: LSH Sampling for Efficient LLM Generation

installation

Experiments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
RULER/RULER		RULER/RULER
attention		attention
data		data
lsh		lsh
models		models
refs		refs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
requirements.txt		requirements.txt

License

Infini-AI-Lab/MagicPIG

Folders and files

Latest commit

History

Repository files navigation

MagicPIG: LSH Sampling for Efficient LLM Generation

installation

Experiments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages