Releases · Future-House/paper-qa

11 Dec 01:19

sidnarayanan

v5.6.1

b1c01a7

v5.6.1

Full Changelog: v5.6.0...v5.6.1

Assets 2

10 Dec 16:43

mskarlin

v5.8.0

58dbfc0

v5.8.0 Latest

Latest

What's Changed

Update all non-major dependencies by @renovate in #745
Created dev extra for convenience by @jamesbraza in #750
Update all non-major dependencies by @renovate in #754
Populated LICENSE by @jamesbraza in #756
Add partitioning func capabilities to allow doc-types-based embedding ranking by @mskarlin in #752
Exposed seeding of LitQA2 read and shuffling by @jamesbraza in #758

Full Changelog: v5.7.0...v5.8.0

Contributors

renovate, jamesbraza, and mskarlin

Assets 2

04 Dec 19:31

jamesbraza

v5.7.0

c36903a

v5.7.0

What's Changed

Moved README to use session over answer by @jamesbraza in #741
Moved Docs.aadd to support str | os.PathLike by @jamesbraza in #742
Cleared up 'Adding Documents Manually' docs by @jamesbraza in #740
Support env states with custom status functions by @mskarlin in #743
Update astral-sh/setup-uv action to v4 by @renovate in #746
Moved JSON summary prompt to mention score is an integer by @jamesbraza in #748

Full Changelog: v5.6.0...v5.7.0

Contributors

renovate, jamesbraza, and mskarlin

Assets 2

02 Dec 21:53

jamesbraza

v5.6.0

0130233

v5.6.0

Highlights

This release is mainly a bunch of bug fixes:

Pulling in breaks in upstream dependencies (e.g. Pydantic 2.10, aviary 0.10.1)
Makes GradablePaperQAEnvironment's evaluations robust to an empty answer or multiple answers

Due to the introduction of Complete.NO_ANSWER_PHRASE in #726 it was requested we consider this a minor version bump, as it will impact system performance.

What's Changed

Fixed settings session into EnvironmentState, and suppressing PyMuPDF derived DeprecationWarning by @jamesbraza in #713
Adding assertion gather_evidence doesn't populate session.answer by @jamesbraza in #716
Lock file maintenance by @renovate in #715
Fixes gather_with_concurrency typing by @maykcaldas in #714
Latest tooling dependencies by @jamesbraza in #719
Lock file maintenance by @renovate in #718
Fixed EVAL_PROMPT_TEMPLATE to handle empty string or multiple match answers by @jamesbraza in #724
Address missing GenerateAnswer in trajectories, no answers after Complete tools, and better history by @mskarlin in #726
Pulling in latest aviary for concurrency rename by @jamesbraza in #728
Pulling in latest aviary for dependencies fix, and retrying flaky test_propagate_options more by @jamesbraza in #729
Pulling in latest ldp for Callback.before_rollout by @jamesbraza in #734
Documenting why we don't handle evaluation failures in GradablePaperQAEnvironment.step by @jamesbraza in #738
Created LitQAEvaluation.calculate_accuracy_precision utility by @jamesbraza in #733
Refreshed test cassettes, fixed flaky test test_search, and fixed test type ignores by @jamesbraza in #739
Unpins pydantic >2.10.2 requirement, removes TYPE_CHECKING by @nadolskit in #725
Lock file maintenance by @renovate in #737
Alternative maybe is text by @loesinghaus in #717

New Contributors

@maykcaldas made their first contribution in #714
@loesinghaus made their first contribution in #717

Full Changelog: v5.5.0...v5.6.0

Contributors

renovate, jamesbraza, and 4 other contributors

Assets 2

03 Dec 01:36

sidnarayanan

v5.5.1

6d3862c

v5.5.1

Full Changelog: v5.5.0...v5.5.1

Assets 2

20 Nov 00:23

jamesbraza

v5.5.0

0b3ef89

v5.5.0

Highlights

In all of v5 before this release, we defined the presence of 1+ answer generations not containing the substring "cannot answer" as the agent loop's end. However, this (suboptimally) leads to the agent loop terminating early on partial answers like "Based on the sources provided, it appears no one has done x." We realized this, and have resolved this issue by:

No longer coupling our done condition with the substring "cannot answer" being not present in 1+ generated answers
No longer implicitly depending on clients mentioning this "cannot answer" sentinel in the input qa prompt

We also fixed several (bad) bugs:

We support parallel tool calling (2+ ToolCalls in one action: ToolRequestMessage). However, our tools (notably gather_evidence) are not actually concurrent-safe. Our tool schemae instructed not to call certain tools in parallel, nonetheless we observed agents specifying gather_evidence to be called in parallel. So now we force our tools to be non-concurrently executed to work around this race condition
When using LitQAEvaluation and the same GradablePaperQAEnvironment 2+ times, we repeatedly added the "unsure" option to the target multiple choice question, degrading performance over time
When using PaperQAEnvironment 2+ times, each reset was not properly wiping the Docs object
The reward distribution of LitQAEvaluation was mixing up "unsure" reward of 0.1 with the "incorrect" reward of -1.0, not properly incentivizing learning

There are a bunch of other minor features, cleanups, and bugfixes here too, see the full list below.

What's Changed

Deprecation cycle for AgentSettings.should_pre_search by @jamesbraza in #679
Moved agent prompts to prompts.py by @jamesbraza in #681
Refactor to remove skip_system from LLMModel.run_prompt by @jamesbraza in #680
Resolving evidence_detailed_citations and Answer deprecations by @jamesbraza in #682
Fixed agent prompt names and contents after #681 mess up by @jamesbraza in #683
Removed tool_names validation for gen_answer being present by @jamesbraza in #685
Fixing test_evaluation logic bugs by @jamesbraza in #686
Removed GenerateAnswer.FAILED_TO_ANSWER as its unnecessary by @jamesbraza in #691
Allowing serialized Settings in get_settings by @jamesbraza in #688
Fixed LDP runner's TRUNCATED not calling gen_answer, and documented AgentStatus by @jamesbraza in #690
Removed gen_answer's dead argument question by @jamesbraza in #689
Making sure we copy distractors by @sidnarayanan in #694
Created complete tool to allow unsure answers by @jamesbraza in #684
Added missing test_from_question cassette by @jamesbraza in #696
Moved fake agent to LLM propose complete tool by @jamesbraza in #695
Default to ordered tool calls, w env variable control by @mskarlin in #697
Lock file maintenance by @renovate in #699
Refactored TestGradablePaperQAEnvironment for DRY code by @jamesbraza in #702
Fixing PaperQAEnvironment.reset respecting mmr_lambda and text_hashes by @jamesbraza in #703
Removed "cannot answer" literals and added reset tool by @jamesbraza in #698
Update all non-major dependencies by @renovate in #705
Fixing LitQAEvaluation bugs: incorrect reward indices, not using LLM's native knowledge by @jamesbraza in #708
Adding filters to paper-qa Docs by @whitead in #707
Fixed mutably defaulted NumpyVectorStore.texts by @jamesbraza in #711

Full Changelog: v5.4.0...v5.5.0

Contributors

whitead, renovate, and 3 other contributors

Assets 2

18 Nov 16:33

mskarlin

v5.3.4

f59b3ab

Hotfix to included `ordered=True` in tool exec calls

Prevents parallel tool calls from clobbering the env. state.

Assets 2

15 Nov 20:35

sidnarayanan

v5.3.3

0af021a

v5.3.3

Full Changelog: v5.3.2...v5.3.3

Assets 2

09 Nov 00:59

jamesbraza

v5.4.0

b21d9c2

v5.4.0

What's Changed

Renamed to PQASession type by @whitead in #653
Lock file maintenance by @renovate in #657
Ability to zero-shot gen_answer by @jamesbraza in #658
Lock file maintenance by @renovate in #659
Moving to uv dependency groups by @jamesbraza in #660
Lock file maintenance by @renovate in #664
Convert citation to formatted_citation usage where necessary by @mskarlin in #666
Catch edge case where externalIds field is None by @mskarlin in #668
Made o1 temperature issue a warning, instead of valueerror by @whitead in #669
Added train and eval splits' questions and DOIs by @jamesbraza in #662
fake agent allowing timeouts or exceptions, by @jamesbraza in #672
Optional AnswerSetting.max_answer_attempts to allow a new unsure branch by @jamesbraza in #673
Made it so you do not die on invalid tool by @whitead in #670
Allowing latest pydantic-settings and regenerated cassettes by @jamesbraza in #674
Empty tool calls leading to done condition by @jamesbraza in #671
Changed it to be debug for source quality by @whitead in #675

Full Changelog: v5.3.2...v5.4.0

Contributors

whitead, renovate, and 2 other contributors

Assets 2

29 Oct 19:50

jamesbraza

v5.3.2

99a9e07

v5.3.2

What's Changed

Printing the text in a failed llm_parse_json by @jamesbraza in #629
Change S2 client logic to use arxiv doi if it's defined by @mskarlin in #632
Increased retry count for ClientConnectorDNSError errors by @jamesbraza in #639
Make string similarity case insensitive by default by @mskarlin in #640
Pulling in latest fhaviary, mypy, ruff by @jamesbraza in #647
Add an after model validator ensuring temp=1 for o1 models by @dakoner in #649
Fixing crash due to None author by @jamesbraza in #650
Fixing flaky test test_minimal_fields_filtering by @jamesbraza in #651
Fixing flaky tests test_code and test_minimal_fields_filtering by @jamesbraza in #652
Lock file maintenance by @renovate in #648

New Contributors

@dakoner made their first contribution in #649

Full Changelog: v5.3.1...v5.3.2

Contributors

renovate, dakoner, and 2 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Contributors

Highlights

What's Changed

New Contributors

Contributors

Highlights

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Releases: Future-House/paper-qa

v5.6.1

v5.8.0

What's Changed

Contributors

v5.7.0

What's Changed

Contributors

v5.6.0

Highlights

What's Changed

New Contributors

Contributors

v5.5.1

v5.5.0

Highlights

What's Changed

Contributors

Hotfix to included `ordered=True` in tool exec calls

v5.3.3

v5.4.0

What's Changed

Contributors

v5.3.2

What's Changed

New Contributors

Contributors