Skip to content

Releases: instructlab/eval

v0.3.1

01 Oct 01:45
c05af4d
Compare
Choose a tag to compare

What's Changed

  • Remove task logic with lm_eval 0.4.4 for agg_score by @danmcp in #143

Full Changelog: v0.3.0...v0.3.1

v0.3.0

28 Sep 01:07
40cc370
Compare
Choose a tag to compare

What's Changed

Note: This release contains two changes which aren't backwards compatible:

  • Remove max_workers and serving_gpus from constructor by @danmcp in #140
  • return overall_score from MTBenchBranch.judge_answers() by @alimaredia in #138

Full Changelog: v0.2.1...v0.3.0

v0.2.1

23 Sep 14:10
53d6abf
Compare
Choose a tag to compare

What's Changed

  • update README by @sallyom in #108
  • Use single answer file and model list (backport #110) by @mergify in #112
  • mergify: add mergify configuration by @nathan-weinberg in #114
  • Bump step-security/harden-runner from 2.8.1 to 2.9.1 by @dependabot in #94
  • ci: move E2E runner from github to AWS by @nathan-weinberg in #118
  • docs: add initial release strategy doc and CHANGELOG by @nathan-weinberg in #91
  • CI: Fix working directories to be relative by @danmcp in #120
  • Bump actions/setup-python from 5.1.1 to 5.2.0 by @dependabot in #119
  • Bump actions/checkout from 4.1.6 to 4.1.7 by @dependabot in #116
  • build(deps): bump pypa/gh-action-pypi-publish from 1.9.0 to 1.10.0 by @dependabot in #122
  • ci: add AWS tags to show github ref and PR num for all jobs by @nathan-weinberg in #123
  • Bump rojopolis/spellcheck-github-actions from 0.38.0 to 0.41.0 by @dependabot in #96
  • build(deps): bump pypa/gh-action-pypi-publish from 1.10.0 to 1.10.1 by @dependabot in #124
  • build(deps): bump hynek/build-and-inspect-python-package from 2.6.0 to 2.9.0 by @dependabot in #125
  • build(deps): bump DavidAnson/markdownlint-cli2-action from 16.0.0 to 17.0.0 by @dependabot in #126
  • build(deps): bump step-security/harden-runner from 2.9.1 to 2.10.1 by @dependabot in #127
  • Add comment to make it clear how the code is working by @danmcp in #105
  • Allow for external serving to be used with mmlu by @danmcp in #99
  • Better path and string handling by @danmcp in #106
  • Improve logging by @danmcp in #111
  • Cleanup usage of load model answers by @danmcp in #115
  • add option to pass 'api_key' to gen_answers, judge_answers by @sallyom in #128
  • e2e: only run PR job if certain files are changed by @nathan-weinberg in #131
  • Allow max_workers to be passed in after evaluator is created by @danmcp in #107
  • Remove fastchat dependency by @danmcp in #98

New Contributors

Full Changelog: v0.2.0...v0.2.1

v0.1.2

27 Aug 23:30
ff54038
Compare
Choose a tag to compare

What's Changed

  • Use single answer file and model list by @danmcp in #110

Full Changelog: v0.1.1...v0.1.2

v0.2.0

23 Aug 01:57
ec709c7
Compare
Choose a tag to compare

What's Changed

  • Changing few_shots default to 5 by @danmcp in #92
  • Don't sleep on last retry attempt by @booxter in #84
  • github: add action to free runner disk space for tox installs by @nathan-weinberg in #93
  • Remove remaining print()s from the library by @booxter in #86
  • Fix e2e by removing old option by @danmcp in #102
  • Default to merge_system_user_message if mistral model detected by @danmcp in #100
  • Dont retry on connection failure by @danmcp in #103
  • Add optional auto tuning for max_workers by @danmcp in #101

New Contributors

Full Changelog: v0.1.1...v0.2.0

v0.1.1

01 Aug 20:52
d272c80
Compare
Choose a tag to compare

What's Changed

  • Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 by @dependabot in #70
  • Revert "Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0" by @alinaryan in #72
  • feat: add new InvalidModelError and handling by @nathan-weinberg in #79
  • small docs update for clarity by @makelinux in #81
  • fix: use the context correctly in mt_bench_branch by @bcrochet in #90
  • fix: catch KeyError in mt_bench_branch by @bcrochet in #89
  • fix: mt_bench_branch should ignore knowledge in generate by @bcrochet in #88

New Contributors

Full Changelog: v0.1.0...v0.1.1

v0.1.0

15 Jul 16:27
ae6097f
Compare
Choose a tag to compare

What's Changed

  • Fixing up test case after api changes to add error_rate by @danmcp in #63
  • Inherit logging from caller rather than from vLLM by @danmcp in #66
  • Update batch size description and allow for str by @danmcp in #67
  • Don't set basicConfig from libraries by @danmcp in #69

Full Changelog: v0.0.9...v0.1.0

v0.0.9

12 Jul 14:48
5257e23
Compare
Choose a tag to compare

What's Changed

  • [mmlu] Allow optionally setting a PyTorch device by @alinaryan in #62
  • Error handling with sdg_path not found and invalid by @danmcp in #61
  • Rename sdg_path to tasks_dir by @danmcp in #64

Full Changelog: v0.0.8...v0.0.9

v0.0.8

09 Jul 16:37
2a27715
Compare
Choose a tag to compare

What's Changed

  • fix: Add specific error handling around git repo input by @danmcp in #52
  • Removing a linting ignore by @danmcp in #58

Full Changelog: v0.0.7...v0.0.8

v0.0.7

08 Jul 23:16
450acaf
Compare
Choose a tag to compare

What's Changed

  • Bump actions/download-artifact from 4.1.7 to 4.1.8 by @dependabot in #53
  • Add missing license identifiers by @danmcp in #56
  • Parameterize ILAB_EVAL_MERGE_SYS_USR by @danmcp in #57
  • Adding basic logging facilities for eval with a first pass at some useful logging by @danmcp in #55

Full Changelog: v0.0.6...v0.0.7