Benchmark 3.5

Latest

Latest

JosephDavidsonKSWH released this 07 Jun 14:19

· 2 commits to main since this release

b69d945

What's Changed

More comparisons for GPT4o, Llama, Mixtral, and Gemini models.
Added benchmarks with smaller memory spans.
Added -i option to run a benchmark with isolated tests (i.e a conversation with sequential, not interleaved, tests)
General updates of evaluations to increase their robustness.

Full Changelog: v3-benchmark...v3.5-benchmark

Assets 2