Skip to content

Commit

Permalink
add lmms-eval (#588)
Browse files Browse the repository at this point in the history
  • Loading branch information
zhimin-z authored Sep 5, 2024
1 parent 93b6476 commit dea9c9f
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -345,6 +345,7 @@ Please review our [CONTRIBUTING.md](https://github.com/EthicalML/awesome-product
* [FMBench](https://github.com/aws-samples/foundation-model-benchmarking-tool) ![](https://img.shields.io/github/stars/aws-samples/foundation-model-benchmarking-tool.svg?style=social) - FMBench is a tool for running performance benchmarks for any Foundation Model (FM) deployed on any AWS Generative AI service, be it Amazon SageMaker, Amazon Bedrock, Amazon EKS, or Amazon EC2.
* [HarmBench](https://github.com/centerforaisafety/HarmBench) ![](https://img.shields.io/github/stars/centerforaisafety/HarmBench.svg?style=social) - HarmBench is a fast and scalable framework for evaluating automated red teaming methods and LLM attacks/defenses.
* [HELM](https://github.com/stanford-crfm/helm) ![](https://img.shields.io/github/stars/stanford-crfm/helm.svg?style=social) - HELM (Holistic Evaluation of Language Models) provides tools for the holistic evaluation of language models, including standardized datasets, a unified API for various models, diverse metrics, robustness, and fairness perturbations, a prompt construction framework, and a proxy server for unified model access.
* [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval) ![](https://img.shields.io/github/stars/EvolvingLMMs-Lab/lmms-eval.svg?style=social) - lmms-eval is an evaluation suite of large multimodal models.
* [Inspect](https://github.com/UKGovernmentBEIS/inspect_ai) ![](https://img.shields.io/github/stars/UKGovernmentBEIS/inspect_ai.svg?style=social) - Inspect is a framework for large language model evaluations.
* [InterCode](https://github.com/princeton-nlp/intercode) ![](https://img.shields.io/github/stars/princeton-nlp/intercode.svg?style=social) - InterCode is a lightweight, flexible, and easy-to-use framework for designing interactive code environments to evaluate language agents that can code.
* [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) ![](https://img.shields.io/github/stars/EleutherAI/lm-evaluation-harness.svg?style=social) - Language Model Evaluation Harness is a framework to test generative language models on a large number of different evaluation tasks.
Expand Down

0 comments on commit dea9c9f

Please sign in to comment.