Skip to content

Benchmark 3.5

Latest
Compare
Choose a tag to compare
@JosephDavidsonKSWH JosephDavidsonKSWH released this 07 Jun 14:19
· 2 commits to main since this release
b69d945

What's Changed

  • More comparisons for GPT4o, Llama, Mixtral, and Gemini models.
  • Added benchmarks with smaller memory spans.
  • Added -i option to run a benchmark with isolated tests (i.e a conversation with sequential, not interleaved, tests)
  • General updates of evaluations to increase their robustness.

Full Changelog: v3-benchmark...v3.5-benchmark