Skip to content

Actions: EleutherAI/lm-evaluation-harness

Tasks Modified

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
2,845 workflow runs
2,845 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

add llama3 tasks
Tasks Modified #4100: Pull request #2556 synchronize by baberabb
January 21, 2025 17:27 2m 26s llama
January 21, 2025 17:27 2m 26s
add llama3 tasks
Tasks Modified #4099: Pull request #2556 synchronize by baberabb
January 21, 2025 17:22 2m 3s llama
January 21, 2025 17:22 2m 3s
Fix max_tokens handling in vllm_vlms.py (#2637)
Tasks Modified #4098: Commit 370e2f9 pushed by baberabb
January 21, 2025 16:55 12s main
January 21, 2025 16:55 12s
aggregate by group (total and categories) (#2643)
Tasks Modified #4097: Commit b2c090c pushed by baberabb
January 21, 2025 16:48 7m 16s main
January 21, 2025 16:48 7m 16s
revise mbpp prompt (#2645)
Tasks Modified #4096: Commit ed9c6fc pushed by baberabb
January 21, 2025 16:46 1m 36s main
January 21, 2025 16:46 1m 36s
revise mbpp prompt
Tasks Modified #4095: Pull request #2645 opened by bzantium
January 21, 2025 04:56 1m 35s feature/#2644
January 21, 2025 04:56 1m 35s
aggregate by group (total and categories)
Tasks Modified #4094: Pull request #2643 opened by bzantium
January 21, 2025 01:22 7m 16s feature/#2640
January 21, 2025 01:22 7m 16s
aggregate by group (total and categories)
Tasks Modified #4093: Pull request #2642 opened by bzantium
January 21, 2025 01:17 7m 28s feature/#2640
January 21, 2025 01:17 7m 28s
Easily evaluate models steered by SAEs
Tasks Modified #4092: Pull request #2641 opened by AMindToThink
January 21, 2025 01:05 Action required AMindToThink:sae_steered
January 21, 2025 01:05 Action required
MMLU Pro Plus
Tasks Modified #4091: Pull request #2366 synchronize by baberabb
January 20, 2025 21:35 2m 53s asgsaeid:mmlu-pro-plus
January 20, 2025 21:35 2m 53s
fixed mmlu generative response extraction (#2503)
Tasks Modified #4090: Commit 12b6eeb pushed by baberabb
January 20, 2025 21:33 2m 6s main
January 20, 2025 21:33 2m 6s
fix tmlu tmlu_taiwan_specific_tasks tag (#2420)
Tasks Modified #4089: Commit 8814407 pushed by baberabb
January 20, 2025 21:16 1m 35s main
January 20, 2025 21:16 1m 35s
Update KorMedMCQA: ver 2.0 (#2540)
Tasks Modified #4088: Commit ff2c49f pushed by baberabb
January 20, 2025 21:05 1m 54s main
January 20, 2025 21:05 1m 54s
Tasks Modified
Tasks Modified #4087: by baberabb
January 20, 2025 21:04 2h 0m 23s main
January 20, 2025 21:04 2h 0m 23s
fixed mmlu generative response extraction
Tasks Modified #4086: Pull request #2503 synchronize by baberabb
January 20, 2025 21:03 2m 10s RawthiL:mmlu_generative_fix
January 20, 2025 21:03 2m 10s
fixed mmlu generative response extraction
Tasks Modified #4085: Pull request #2503 synchronize by baberabb
January 20, 2025 20:58 1m 50s RawthiL:mmlu_generative_fix
January 20, 2025 20:58 1m 50s
fixed mmlu generative response extraction
Tasks Modified #4084: Pull request #2503 synchronize by baberabb
January 20, 2025 20:57 1m 27s RawthiL:mmlu_generative_fix
January 20, 2025 20:57 1m 27s
fixed mmlu generative response extraction
Tasks Modified #4083: Pull request #2503 synchronize by baberabb
January 20, 2025 20:52 1m 27s RawthiL:mmlu_generative_fix
January 20, 2025 20:52 1m 27s
fixed mmlu generative response extraction
Tasks Modified #4082: Pull request #2503 synchronize by baberabb
January 20, 2025 20:48 1m 29s RawthiL:mmlu_generative_fix
January 20, 2025 20:48 1m 29s
New arabicmmlu (#2541)
Tasks Modified #4081: Commit 6dac8c6 pushed by baberabb
January 20, 2025 20:46 4m 40s main
January 20, 2025 20:46 4m 40s
fix multiple input chat tempalte
Tasks Modified #4080: Pull request #2576 synchronize by baberabb
January 20, 2025 20:43 1m 29s multiple_input
January 20, 2025 20:43 1m 29s
add hrm8k benchmark for both Korean and English (#2627)
Tasks Modified #4079: Commit a5c344c pushed by baberabb
January 20, 2025 20:38 1m 41s main
January 20, 2025 20:38 1m 41s
add hrm8k benchmark for both Korean and English
Tasks Modified #4078: Pull request #2627 synchronize by baberabb
January 20, 2025 20:30 1m 37s feature/#2623
January 20, 2025 20:30 1m 37s
Fix max_tokens handling in vllm_vlms.py
Tasks Modified #4077: Pull request #2637 synchronize by baberabb
January 20, 2025 18:16 14s jkaniecki:Fix_vllm_vlms_max_tokens
January 20, 2025 18:16 14s