-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds more thorough pytest example #442
Conversation
A preview of fb960cb is uploaded and can be seen here: ✨ https://burr.dagworks.io/pull/442 ✨ Changes may take a few minutes to propagate. Since this is a preview of production, content with |
3ced84f
to
a792759
Compare
TODO: show this:
|
This is a WIP. This shows how one might log things they don't want to fail on to a results_bag and then access them in a dataframe...
This shows how to use pytest to test an action. TODO: - how to use burr fixture - how to test agent and use tracker
01c7e7c
to
3b2500f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 0520ad8 in 47 seconds
More details
- Looked at
779
lines of code in8
files - Skipped
1
files when reviewing. - Skipped posting
7
drafted comments based on config settings.
1. examples/pytest/README.md:71
- Draft comment:
Typo: 'acheive' should be 'achieve'. - Reason this comment was not posted:
Confidence changes required:10%
The README.md file contains a typo in the word 'acheive'. It should be corrected to 'achieve'.
2. examples/pytest/README.md:207
- Draft comment:
Typo: 'parameterizeable' should be 'parameterizable'. - Reason this comment was not posted:
Confidence changes required:10%
The README.md file contains a typo in the word 'parameterizeable'. It should be corrected to 'parameterizable'.
3. examples/pytest/README.md:67
- Draft comment:
Typo: 'walkthrough' should be 'walk through'. - Reason this comment was not posted:
Confidence changes required:10%
The README.md file contains a typo in the word 'walkthrough'. It should be corrected to 'walk through'.
4. examples/pytest/e2e_test_cases.json:1
- Draft comment:
The JSON structure looks good and well-formed. - Reason this comment was not posted:
Confidence changes required:0%
The JSON files are well-structured and do not contain any issues. Moving on to the next file.
5. examples/pytest/hypotheses_test_cases.json:1
- Draft comment:
The JSON structure looks good and well-formed. - Reason this comment was not posted:
Confidence changes required:0%
The JSON files are well-structured and do not contain any issues. Moving on to the next file.
6. examples/pytest/requirements.txt:1
- Draft comment:
The requirements.txt file is correctly formatted. - Reason this comment was not posted:
Confidence changes required:0%
The requirements.txt file is simple and correct. No issues found.
7. examples/pytest/some_actions.py:21
- Draft comment:
Ensure that the OpenAI client is correctly configured and the model name 'gpt-4o-mini' is valid. - Reason this comment was not posted:
Confidence changes required:50%
The code in some_actions.py is mostly fine, but there is a potential issue with the OpenAI client instantiation. It should be checked if the client is correctly configured and if the model name is valid.
Workflow ID: wflow_0ceRxpohsZVovriJ
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
With link to mlflow.evaluate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 63e2f4e in 12 seconds
More details
- Looked at
48
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. examples/pytest/README.md:69
- Draft comment:
Numbering error: The list numbering repeats '4'. Consider changing the second '4.' to '5.' for clarity. - Reason this comment was not posted:
Confidence changes required:10%
The README file contains a minor typo in the numbering of the list under 'What kind of "asserts" do we want?'. The number 4 is repeated twice, which should be corrected for clarity.
Workflow ID: wflow_KDoSwz3G5ga4L3t8
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 8e820a3 in 31 seconds
More details
- Looked at
48
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. docs/examples/guardrails/index.rst:2
- Draft comment:
The title change from "Guardrails" to "Guardrails / Tests" is not aligned with the content of the document, which focuses on creating test cases. Consider reverting to the original title or ensuring the content matches the broader scope. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
This is a documentation change. The comment speculates about content mismatch without having access to the actual content of the creating_tests file. Since this is an index file that includes a section about tests, adding "Tests" to the title seems reasonable. Documentation organization is subjective and should be left to the author's discretion unless there's clear evidence of a problem.
I haven't seen the content of creating_tests.rst, so I can't be certain about the document's focus. Maybe there's a real mismatch.
Even without seeing creating_tests.rst, the toctree includes a tests-related file, so the title change appears justified. Documentation structure feedback should have strong evidence of problems.
Delete this comment. It's speculative and questions a reasonable documentation organization choice without clear evidence of a problem.
Workflow ID: wflow_27ILJpSkYltLHrUy
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on b9a21d7 in 17 seconds
More details
- Looked at
57
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. examples/pytest/README.md:17
- Draft comment:
Consider rephrasing for clarity:
An agent or augmented LLM is a combination of LLM calls and logic. But how do we know if it's working? We can test and evaluate it.
- Reason this comment was not posted:
Confidence changes required:10%
The README.md file in the examples/pytest directory contains a minor grammatical error in line 17. The sentence should be more concise and clear.
Workflow ID: wflow_FzcITiRUWE42mLjl
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
b9a21d7
to
84c5b0d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 84c5b0d in 12 seconds
More details
- Looked at
58
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. examples/pytest/README.md:11
- Draft comment:
Typographical error in list numbering. The second '4.' should be '5.' for correct sequential numbering. - Reason this comment was not posted:
Confidence changes required:10%
The README.md file in the examples/pytest directory contains a minor typographical error in the list numbering. The second '4.' should be '5.' to maintain correct sequential numbering.
Workflow ID: wflow_nupTErg5VyCjx1Pu
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
84c5b0d
to
ee7a2e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on ee7a2e2 in 15 seconds
More details
- Looked at
69
lines of code in3
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. examples/pytest/README.md:17
- Draft comment:
Consider adding a comma after "Well" for clarity.
Well, we can test & evaluate it.
- Reason this comment was not posted:
Confidence changes required:10%
The README.md file in the examples/pytest directory contains a minor grammatical error in line 17. The sentence "But how do we know if it's working? Well we can test & evaluate it." would be clearer with a comma after "Well".
Workflow ID: wflow_IOF6jRtyHbxge8pQ
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 354fa86 in 16 seconds
More details
- Looked at
36
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
3
drafted comments based on config settings.
1. examples/pytest/some_actions.py:23
- Draft comment:
The f-string is unnecessary here as there are no variables to interpolate. Consider using a regular string. - Reason this comment was not posted:
Confidence changes required:50%
The prompt string concatenation is correct, but the f-string is unnecessary since there are no variables to interpolate.
2. examples/pytest/some_actions.py:26
- Draft comment:
The f-string is unnecessary here as there are no variables to interpolate. Consider using a regular string. - Reason this comment was not posted:
Confidence changes required:50%
The prompt string concatenation is correct, but the f-string is unnecessary since there are no variables to interpolate.
3. examples/pytest/some_actions.py:27
- Draft comment:
The f-string is unnecessary here as there are no variables to interpolate. Consider using a regular string. - Reason this comment was not posted:
Confidence changes required:50%
The prompt string concatenation is correct, but the f-string is unnecessary since there are no variables to interpolate.
Workflow ID: wflow_EPay11lueIPsg1Kk
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 4838d64 in 11 seconds
More details
- Looked at
14
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
0
drafted comments based on config settings.
Workflow ID: wflow_Rq5cXafnqgYOhJv3
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on fb960cb in 12 seconds
More details
- Looked at
15
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. examples/pytest/README.md:86
- Draft comment:
The list numbering is incorrect. There are two items labeled as '4.' in the list of evaluation methods. Please correct the numbering. - Reason this comment was not posted:
Confidence changes required:10%
The README.md file contains a minor issue with the numbering of the list items. There are two items labeled as '4.' in the list of evaluation methods. This issue is not in the changed lines, so I cannot comment directly on it.
Workflow ID: wflow_dgdFco9ZJ3J1jWkZ
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
This shows how one can use pytest and Burr's functionality.
Changes
How I tested this
Notes
Checklist
Important
Adds comprehensive
pytest
examples and documentation for testing agents and LLM applications with Burr.README.md
inexamples/pytest
detailingpytest
usage for testing agents and LLM applications.some_actions.py
with example actions for an augmented LLM application.test_some_actions.py
with tests for actions insome_actions.py
usingpytest
andpytest-harvest
.e2e_test_cases.json
andhypotheses_test_cases.json
for parameterized testing with Burr.conftest.py
with a customResultCollector
fixture for collecting test results.requirements.txt
to includeburr
,pytest
, andpytest-harvest
.validate_examples.py
to includepytest
in the filter list for validation.creating_tests.rst
andindex.rst
to reference newpytest
examples.This description was created by for fb960cb. It will automatically update as commits are pushed.