You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, by referring to your Bitnet documentation, I was able to get the correct result. However, when I tried using the llama-2-7b model, I encountered an issue.
I would like to know where I might have gone wrong?
Additionally, I am quite curious about one more thing. If I want to adapt a new model (such as MiniCPM or llama-3.2), what specific steps should I take?
The text was updated successfully, but these errors were encountered:
Can you use run_pipeline.py please? The script will output the commands it execute:
Compile kernels
Compile and install T-MAC
Convert hf to gguf
Compile llama.cpp
Inference
If I want to adapt a new model (such as MiniCPM or llama-3.2), what specific steps should I take?
You need to check if the new model is supported by llama.cpp. If the answer is yes, you just need to find a GPTQ version of the model and specify -m gptq-auto for run_pipeline.py.
Currently, by referring to your Bitnet documentation, I was able to get the correct result. However, when I tried using the llama-2-7b model, I encountered an issue.
Based on a previous issue #31 , I chose the model ChenMnZ/Llama-2-7b-EfficientQAT-w4g128-GPTQ.
First, I compiled the TMAC kernel using the following command, and with cmake, placed the kernel.cc file in the correct path.
Then, I converted the model using the following command:
However, when I executed the model using the following command, the output was abnormal, and the results were very strange.
Result :
I would like to know where I might have gone wrong?
Additionally, I am quite curious about one more thing. If I want to adapt a new model (such as MiniCPM or llama-3.2), what specific steps should I take?
The text was updated successfully, but these errors were encountered: