Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MBPP #2247

Merged
merged 6 commits into from
Jan 15, 2025
Merged

Add MBPP #2247

merged 6 commits into from
Jan 15, 2025

Conversation

hjlee1371
Copy link
Contributor

Hi, I added the widely-used MBPP benchmark. This partially resolves #1157.

Similar to #1992 , the implementation relies on pass@k from the HF evaluate module, so it requires the environment variable HF_ALLOW_CODE_EVAL=1.

Below are results for some important pretrained models, along with scores reported from the llama3 and gemma2 papers. Also note that the prompting follows the original paper, which is different from the bigcode-eval.

Models 3-shot MBPP pass@1 (lm-eval) reported from llama3 reported from gemma2
Meta-Llama-3-8B 46.0 - -
Meta-Llama-3.1-8B 47.0 47.6 -
gemma-7b 44.8 44.4 44.4
Mistral-7b-v0.1 37.8 47.5 40.2

@go2ready
Copy link

Hello! Any blockers for adding MBPP?

@baberabb
Copy link
Contributor

thanks for the PR!

@baberabb baberabb merged commit 5db23e2 into EleutherAI:main Jan 15, 2025
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Discussion] Add Major Code Benchmarks
3 participants