Releases · instructlab/eval

build(deps): bump pypa/gh-action-pypi-publish from 1.10.1 to 1.10.2 by @dependabot in #133
build(deps): bump rojopolis/spellcheck-github-actions from 0.41.0 to 0.42.0 by @dependabot in #132
docs: update README with more contextual eval info by @nathan-weinberg in #130
github: add stale bot to eval repo by @nathan-weinberg in #136
ci: fix lint action by @nathan-weinberg in #137
build(deps): bump rhysd/actionlint from 1.7.1 to 1.7.2 in /.github/workflows by @dependabot in #134
Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 by @dependabot in #76
build(deps): bump actions/checkout from 4.1.7 to 4.2.0 by @dependabot in #139
Remove max_workers and serving_gpus from constructor by @danmcp in #140
return overall_score from MTBenchBranch.judge_answers() by @alimaredia in #138

Note: This release contains two changes which aren't backwards compatible:

Remove max_workers and serving_gpus from constructor by @danmcp in #140
return overall_score from MTBenchBranch.judge_answers() by @alimaredia in #138

Full Changelog: v0.2.1...v0.3.0

Contributors

danmcp, alimaredia, and 2 other contributors

Assets 6

23 Sep 14:10

danmcp

v0.2.1

53d6abf

v0.2.1

What's Changed

update README by @sallyom in #108
Use single answer file and model list (backport #110) by @mergify in #112
mergify: add mergify configuration by @nathan-weinberg in #114
Bump step-security/harden-runner from 2.8.1 to 2.9.1 by @dependabot in #94
ci: move E2E runner from github to AWS by @nathan-weinberg in #118
docs: add initial release strategy doc and CHANGELOG by @nathan-weinberg in #91
CI: Fix working directories to be relative by @danmcp in #120
Bump actions/setup-python from 5.1.1 to 5.2.0 by @dependabot in #119
Bump actions/checkout from 4.1.6 to 4.1.7 by @dependabot in #116
build(deps): bump pypa/gh-action-pypi-publish from 1.9.0 to 1.10.0 by @dependabot in #122
ci: add AWS tags to show github ref and PR num for all jobs by @nathan-weinberg in #123
Bump rojopolis/spellcheck-github-actions from 0.38.0 to 0.41.0 by @dependabot in #96
build(deps): bump pypa/gh-action-pypi-publish from 1.10.0 to 1.10.1 by @dependabot in #124
build(deps): bump hynek/build-and-inspect-python-package from 2.6.0 to 2.9.0 by @dependabot in #125
build(deps): bump DavidAnson/markdownlint-cli2-action from 16.0.0 to 17.0.0 by @dependabot in #126
build(deps): bump step-security/harden-runner from 2.9.1 to 2.10.1 by @dependabot in #127
Add comment to make it clear how the code is working by @danmcp in #105
Allow for external serving to be used with mmlu by @danmcp in #99
Better path and string handling by @danmcp in #106
Improve logging by @danmcp in #111
Cleanup usage of load model answers by @danmcp in #115
add option to pass 'api_key' to gen_answers, judge_answers by @sallyom in #128
e2e: only run PR job if certain files are changed by @nathan-weinberg in #131
Allow max_workers to be passed in after evaluator is created by @danmcp in #107
Remove fastchat dependency by @danmcp in #98

New Contributors

@sallyom made their first contribution in #108
@mergify made their first contribution in #112

Full Changelog: v0.2.0...v0.2.1

Contributors

danmcp, sallyom, and 3 other contributors

Assets 6

27 Aug 23:30

danmcp

v0.1.2

ff54038

v0.1.2

What's Changed

Use single answer file and model list by @danmcp in #110

Full Changelog: v0.1.1...v0.1.2

Contributors

danmcp

Assets 6

23 Aug 01:57

danmcp

v0.2.0

ec709c7

v0.2.0

What's Changed

Changing few_shots default to 5 by @danmcp in #92
Don't sleep on last retry attempt by @booxter in #84
github: add action to free runner disk space for tox installs by @nathan-weinberg in #93
Remove remaining print()s from the library by @booxter in #86
Fix e2e by removing old option by @danmcp in #102
Default to merge_system_user_message if mistral model detected by @danmcp in #100
Dont retry on connection failure by @danmcp in #103
Add optional auto tuning for max_workers by @danmcp in #101

New Contributors

@booxter made their first contribution in #84

Full Changelog: v0.1.1...v0.2.0

Contributors

booxter, danmcp, and nathan-weinberg

Assets 6

01 Aug 20:52

nathan-weinberg

v0.1.1

d272c80

v0.1.1

What's Changed

Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 by @dependabot in #70
Revert "Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0" by @alinaryan in #72
feat: add new InvalidModelError and handling by @nathan-weinberg in #79
small docs update for clarity by @makelinux in #81
fix: use the context correctly in mt_bench_branch by @bcrochet in #90
fix: catch KeyError in mt_bench_branch by @bcrochet in #89
fix: mt_bench_branch should ignore knowledge in generate by @bcrochet in #88

New Contributors

@makelinux made their first contribution in #81
@bcrochet made their first contribution in #90

Full Changelog: v0.1.0...v0.1.1

Contributors

bcrochet, makelinux, and 3 other contributors

Assets 6

15 Jul 16:27

nathan-weinberg

v0.1.0

ae6097f

v0.1.0

What's Changed

Fixing up test case after api changes to add error_rate by @danmcp in #63
Inherit logging from caller rather than from vLLM by @danmcp in #66
Update batch size description and allow for str by @danmcp in #67
Don't set basicConfig from libraries by @danmcp in #69

Full Changelog: v0.0.9...v0.1.0

Contributors

danmcp

Assets 6

12 Jul 14:48

danmcp

v0.0.9

5257e23

v0.0.9

What's Changed

[mmlu] Allow optionally setting a PyTorch device by @alinaryan in #62
Error handling with sdg_path not found and invalid by @danmcp in #61
Rename sdg_path to tasks_dir by @danmcp in #64

Full Changelog: v0.0.8...v0.0.9

Contributors

danmcp and alinaryan

Assets 6

09 Jul 16:37

alinaryan

v0.0.8

2a27715

v0.0.8

What's Changed

fix: Add specific error handling around git repo input by @danmcp in #52
Removing a linting ignore by @danmcp in #58

Full Changelog: v0.0.7...v0.0.8

Contributors

danmcp

Assets 6

08 Jul 23:16

danmcp

v0.0.7

450acaf

v0.0.7

What's Changed

Bump actions/download-artifact from 4.1.7 to 4.1.8 by @dependabot in #53
Add missing license identifiers by @danmcp in #56
Parameterize ILAB_EVAL_MERGE_SYS_USR by @danmcp in #57
Adding basic logging facilities for eval with a first pass at some useful logging by @danmcp in #55

Full Changelog: v0.0.6...v0.0.7

Contributors

danmcp and dependabot

Assets 6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Note: This release contains two changes which aren't backwards compatible:

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

Releases: instructlab/eval

v0.3.1

What's Changed

Contributors

v0.3.0

What's Changed

Note: This release contains two changes which aren't backwards compatible:

Contributors

v0.2.1

What's Changed

New Contributors

Contributors

v0.1.2

What's Changed

Contributors

v0.2.0

What's Changed

New Contributors

Contributors

v0.1.1

What's Changed

New Contributors

Contributors

v0.1.0

What's Changed

Contributors

v0.0.9

What's Changed

Contributors

v0.0.8

What's Changed

Contributors

v0.0.7

What's Changed

Contributors