LLaMa2 evaluation repo

This repo serves as an experimentation playground for usage and performance evaluation on the LLaMa2 model family.

Currently covers:

Loading model in various precisions including quantized
Running standard inference + properly formatted prompt based inference of the chat models
Evaluation of NF4 vs int8 performance on a few benchmarks

Future TODOs:

QLoRA training
Evaluation of 16-bit precisions + the larger LLaMa models
Implement batch inference for each of the evaluations
Evaluation against other non-llama models + upstream derivitives from the base models

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
conf		conf
outputs		outputs
.gitignore		.gitignore
00_run_tests.ipynb		00_run_tests.ipynb
02_dialog.ipynb		02_dialog.ipynb
03_generation.ipynb		03_generation.ipynb
04_hellaswag.ipynb		04_hellaswag.ipynb
05_squad.ipynb		05_squad.ipynb
06_glue.ipynb		06_glue.ipynb
README.md		README.md
eval.py		eval.py
inference.ipynb		inference.ipynb
llama2.py		llama2.py

Provide feedback