This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

lm-eval for llama.cpp enhancement. #1543

Open

lkk12014402 wants to merge 4 commits into main from enable_llamacpp_lm_eval

Collaborator

lkk12014402 commented May 12, 2024 •

edited

Loading

Type of Change

enable lm-eval for llama.cpp models

API not changed

Description

refer to the lm-eval official code and llama-cpp-python

improvements:

load llama.cpp model directly when do lm-eval (the official code needs launch a llama.cpp server)
For qwen models, revise the detokenize func because some error occurs during evaluation and force to add bos_id for qwen models because the llama-cpp-python doesn't add bos_id successfully. Even though some changes for qwen, I still find that the tokenizer results are different between llama.cpp and huggingface/transformers. I will verify this further.
As describe in the comments at llama-cpp-python, I implement it with a custom class, which can accelerate the post-process.


          lm-eval for llama.cpp enhancement.

184678f

lkk12014402 requested review from changwangss and hshen14

May 12, 2024 09:55

lkk12014402 requested a review from PenghuiCheng as a code owner

May 12, 2024 09:55

github-actions bot commented May 12, 2024 •

edited

Loading

⛈️ Required checks status: Has failure 🔴

Warning
If you do not have the access to re-run the CI-Summary bot, please contact VincyZhang for help. If you push a new commit, all of the workflow will be re-triggered.

Groups summary

🔴 Format Scan Tests workflow

Check ID	Status	Error details
format-scan (pylint)	failure	download	❌
format-scan (bandit)	success		✅
format-scan (cloc)	success		✅
format-scan (cpplint)	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py.

🔴 Optimize Unit Test workflow

Check ID	Status	Error details
optimize-unit-test-baseline	success		✅
optimize-unit-test-PR-test	failure	download	❌
Genreate-OptimizeUT-Report	skipped		❓

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py.

🟢 NeuralChat Unit Test

Check ID	Status	Error details
neuralchat-unit-test-baseline	success		✅
neuralchat-unit-test-PR-test	success		✅
Generate-NeuralChat-Report	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py.

🟢 Engine Unit Test workflow

Check ID	Status	Error details
engine-unit-test-baseline	success		✅
engine-unit-test-PR-test	success		✅
Genreate-Engine-Report	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py.

🟢 Chat Bot Test workflow

Check ID	Status	Error details
call-inference-llama-2-7b-chat-hf / inference test	success		✅
call-inference-mpt-7b-chat / inference test	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py.

Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.


          [pre-commit.ci] auto fixes from pre-commit.com hooks

fcce20c

for more information, see https://pre-commit.ci

Collaborator Author

lkk12014402 commented May 12, 2024 •

edited

Loading

usages:

CPU

model_name = "Qwen/Qwen1.5-0.5B-Chat-GGUF"
from intel_extension_for_transformers.transformers.llm.evaluation.lm_eval import evaluate, LMEvalParser
eval_args = LMEvalParser(model = "gguf-custom",
        model_args='pretrained=' + model_name + ',ftype=' + '*q4_0.gguf',
        device = "cpu",
        tasks = "hellaswag",
        batch_size = 2,
        limit = 10)
results = evaluate(eval_args)

print(results["results"])

GPU

model_name = "Qwen/Qwen1.5-0.5B-Chat-GGUF"
from intel_extension_for_transformers.transformers.llm.evaluation.lm_eval import evaluate, LMEvalParser
eval_args = LMEvalParser(model = "gguf-custom",
        model_args='pretrained=' + model_name + ',ftype=' + '*q4_0.gguf',
        device = "cuda",
        tasks = "hellaswag",
        batch_size = 2,
        limit = 10)
results = evaluate(eval_args)

print(results["results"])

hshen14 approved these changes

View reviewed changes

VincyZhang and others added 2 commits

May 12, 2024 20:00


          Merge branch 'main' into enable_llamacpp_lm_eval

c706ae3


          Merge branch 'main' into enable_llamacpp_lm_eval

3048eae

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet