Skip to content

Actions: EleutherAI/lm-evaluation-harness

All workflows

Actions

Loading...
Loading

Showing runs from all workflows
5,703 workflow runs
5,703 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

add llama3 tasks
Unit Tests #4072: Pull request #2556 synchronize by baberabb
January 21, 2025 17:27 7m 18s llama
January 21, 2025 17:27 7m 18s
add llama3 tasks
Tasks Modified #4100: Pull request #2556 synchronize by baberabb
January 21, 2025 17:27 2m 26s llama
January 21, 2025 17:27 2m 26s
add llama3 tasks
Tasks Modified #4099: Pull request #2556 synchronize by baberabb
January 21, 2025 17:22 2m 3s llama
January 21, 2025 17:22 2m 3s
add llama3 tasks
Unit Tests #4071: Pull request #2556 synchronize by baberabb
January 21, 2025 17:22 7m 9s llama
January 21, 2025 17:22 7m 9s
Fix max_tokens handling in vllm_vlms.py (#2637)
Unit Tests #4070: Commit 370e2f9 pushed by baberabb
January 21, 2025 16:55 6m 59s main
January 21, 2025 16:55 6m 59s
Fix max_tokens handling in vllm_vlms.py (#2637)
Tasks Modified #4098: Commit 370e2f9 pushed by baberabb
January 21, 2025 16:55 12s main
January 21, 2025 16:55 12s
aggregate by group (total and categories) (#2643)
Tasks Modified #4097: Commit b2c090c pushed by baberabb
January 21, 2025 16:48 7m 16s main
January 21, 2025 16:48 7m 16s
aggregate by group (total and categories) (#2643)
Unit Tests #4069: Commit b2c090c pushed by baberabb
January 21, 2025 16:48 7m 37s main
January 21, 2025 16:48 7m 37s
revise mbpp prompt (#2645)
Tasks Modified #4096: Commit ed9c6fc pushed by baberabb
January 21, 2025 16:46 1m 36s main
January 21, 2025 16:46 1m 36s
revise mbpp prompt (#2645)
Unit Tests #4068: Commit ed9c6fc pushed by baberabb
January 21, 2025 16:46 7m 33s main
January 21, 2025 16:46 7m 33s
revise mbpp prompt
Unit Tests #4067: Pull request #2645 opened by bzantium
January 21, 2025 04:56 6m 59s feature/#2644
January 21, 2025 04:56 6m 59s
revise mbpp prompt
Tasks Modified #4095: Pull request #2645 opened by bzantium
January 21, 2025 04:56 1m 35s feature/#2644
January 21, 2025 04:56 1m 35s
aggregate by group (total and categories)
Tasks Modified #4094: Pull request #2643 opened by bzantium
January 21, 2025 01:22 7m 16s feature/#2640
January 21, 2025 01:22 7m 16s
aggregate by group (total and categories)
Unit Tests #4066: Pull request #2643 opened by bzantium
January 21, 2025 01:22 7m 30s feature/#2640
January 21, 2025 01:22 7m 30s
aggregate by group (total and categories)
Unit Tests #4065: Pull request #2642 opened by bzantium
January 21, 2025 01:17 7m 26s feature/#2640
January 21, 2025 01:17 7m 26s
aggregate by group (total and categories)
Tasks Modified #4093: Pull request #2642 opened by bzantium
January 21, 2025 01:17 7m 28s feature/#2640
January 21, 2025 01:17 7m 28s
Easily evaluate models steered by SAEs
Tasks Modified #4092: Pull request #2641 opened by AMindToThink
January 21, 2025 01:05 Action required AMindToThink:sae_steered
January 21, 2025 01:05 Action required
Easily evaluate models steered by SAEs
Unit Tests #4064: Pull request #2641 opened by AMindToThink
January 21, 2025 01:05 Action required AMindToThink:sae_steered
January 21, 2025 01:05 Action required
MMLU Pro Plus
Tasks Modified #4091: Pull request #2366 synchronize by baberabb
January 20, 2025 21:35 2m 53s asgsaeid:mmlu-pro-plus
January 20, 2025 21:35 2m 53s
MMLU Pro Plus
Unit Tests #4063: Pull request #2366 synchronize by baberabb
January 20, 2025 21:35 6m 51s asgsaeid:mmlu-pro-plus
January 20, 2025 21:35 6m 51s
fixed mmlu generative response extraction (#2503)
Tasks Modified #4090: Commit 12b6eeb pushed by baberabb
January 20, 2025 21:33 2m 6s main
January 20, 2025 21:33 2m 6s
fixed mmlu generative response extraction (#2503)
Unit Tests #4062: Commit 12b6eeb pushed by baberabb
January 20, 2025 21:33 7m 17s main
January 20, 2025 21:33 7m 17s
fix tmlu tmlu_taiwan_specific_tasks tag (#2420)
Unit Tests #4061: Commit 8814407 pushed by baberabb
January 20, 2025 21:16 7m 9s main
January 20, 2025 21:16 7m 9s
fix tmlu tmlu_taiwan_specific_tasks tag (#2420)
Tasks Modified #4089: Commit 8814407 pushed by baberabb
January 20, 2025 21:16 1m 35s main
January 20, 2025 21:16 1m 35s
Update KorMedMCQA: ver 2.0 (#2540)
Tasks Modified #4088: Commit ff2c49f pushed by baberabb
January 20, 2025 21:05 1m 54s main
January 20, 2025 21:05 1m 54s