Cannot get llama-2-7b-4bit quant model run normally. #62

Zijie-Tian · 2024-10-15T04:13:39Z

Currently, by referring to your Bitnet documentation, I was able to get the correct result. However, when I tried using the llama-2-7b model, I encountered an issue.

Based on a previous issue #31 , I chose the model ChenMnZ/Llama-2-7b-EfficientQAT-w4g128-GPTQ.

First, I compiled the TMAC kernel using the following command, and with cmake, placed the kernel.cc file in the correct path.

python compile.py -o tuned -da -nt 12 -tb -gc -ags 64 -t -m llama-2-7b-4bit -md /data/hf_models/Llama-2-7b-EfficientQAT-w4g128-GPTQ

Then, I converted the model using the following command:

python convert_hf_to_gguf.py /data/hf_models/Llama-2-7b-EfficientQAT-w4g128-GPTQ --enable-t-mac --outtype int_n --outfile /data/gguf/T-MAC/llama-2-7B.i4.gguf --kcfg ${TMAC_PROJECT_DIR}/install/lib/kcfg.ini

However, when I executed the model using the following command, the output was abnormal, and the results were very strange.

./bin/llama-cli -m /data/gguf/T-MAC/llama-2-7B.i4.gguf -p "Write a resign email." -n 1024 -t 12 -ngl 0

Result :

I would like to know where I might have gone wrong?

Additionally, I am quite curious about one more thing. If I want to adapt a new model (such as MiniCPM or llama-3.2), what specific steps should I take?

The text was updated successfully, but these errors were encountered:

kaleid-liner · 2024-10-16T06:00:33Z

Can you use run_pipeline.py please? The script will output the commands it execute:

Compile kernels
Compile and install T-MAC
Convert hf to gguf
Compile llama.cpp
Inference

If I want to adapt a new model (such as MiniCPM or llama-3.2), what specific steps should I take?

You need to check if the new model is supported by llama.cpp. If the answer is yes, you just need to find a GPTQ version of the model and specify -m gptq-auto for run_pipeline.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot get llama-2-7b-4bit quant model run normally. #62

Cannot get llama-2-7b-4bit quant model run normally. #62

Zijie-Tian commented Oct 15, 2024

kaleid-liner commented Oct 16, 2024

Cannot get llama-2-7b-4bit quant model run normally. #62

Cannot get llama-2-7b-4bit quant model run normally. #62

Comments

Zijie-Tian commented Oct 15, 2024

kaleid-liner commented Oct 16, 2024