Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds more thorough pytest example #442

Merged
merged 15 commits into from
Dec 27, 2024
Merged

Adds more thorough pytest example #442

merged 15 commits into from
Dec 27, 2024

Conversation

skrawcz
Copy link
Contributor

@skrawcz skrawcz commented Nov 30, 2024

This shows how one can use pytest and Burr's functionality.

Changes

  • adds example

How I tested this

  • runs locally

Notes

  • README will turn into blog post

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

Important

Adds comprehensive pytest examples and documentation for testing agents and LLM applications with Burr.

  • Examples:
    • Adds README.md in examples/pytest detailing pytest usage for testing agents and LLM applications.
    • Introduces some_actions.py with example actions for an augmented LLM application.
    • Adds test_some_actions.py with tests for actions in some_actions.py using pytest and pytest-harvest.
  • Test Cases:
    • Adds e2e_test_cases.json and hypotheses_test_cases.json for parameterized testing with Burr.
  • Fixtures and Utilities:
    • Adds conftest.py with a custom ResultCollector fixture for collecting test results.
    • Updates requirements.txt to include burr, pytest, and pytest-harvest.
  • Validation:
    • Updates validate_examples.py to include pytest in the filter list for validation.
  • Documentation:
    • Updates creating_tests.rst and index.rst to reference new pytest examples.

This description was created by Ellipsis for fb960cb. It will automatically update as commits are pushed.

Copy link

github-actions bot commented Nov 30, 2024

A preview of fb960cb is uploaded and can be seen here:

https://burr.dagworks.io/pull/442

Changes may take a few minutes to propagate. Since this is a preview of production, content with draft: true will not be rendered. The source is here: https://github.com/DAGWorks-Inc/burr/tree/gh-pages/pull/442/

@skrawcz skrawcz force-pushed the add_pytest_example branch from 3ced84f to a792759 Compare December 4, 2024 07:21
@skrawcz
Copy link
Contributor Author

skrawcz commented Dec 4, 2024

TODO: show this:

E.g. run pytest, get dataframe of results (and do something with it), but also see traces for each of those runs in the Burr UI.

This is a WIP. This shows how
one might log things they don't
want to fail on to a results_bag
and then access them in a dataframe...
This shows how to use pytest to test an action.

TODO:
 - how to use burr fixture
 - how to test agent and use tracker
@skrawcz skrawcz marked this pull request as ready for review December 24, 2024 22:11
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 0520ad8 in 47 seconds

More details
  • Looked at 779 lines of code in 8 files
  • Skipped 1 files when reviewing.
  • Skipped posting 7 drafted comments based on config settings.
1. examples/pytest/README.md:71
  • Draft comment:
    Typo: 'acheive' should be 'achieve'.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The README.md file contains a typo in the word 'acheive'. It should be corrected to 'achieve'.
2. examples/pytest/README.md:207
  • Draft comment:
    Typo: 'parameterizeable' should be 'parameterizable'.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The README.md file contains a typo in the word 'parameterizeable'. It should be corrected to 'parameterizable'.
3. examples/pytest/README.md:67
  • Draft comment:
    Typo: 'walkthrough' should be 'walk through'.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The README.md file contains a typo in the word 'walkthrough'. It should be corrected to 'walk through'.
4. examples/pytest/e2e_test_cases.json:1
  • Draft comment:
    The JSON structure looks good and well-formed.
  • Reason this comment was not posted:
    Confidence changes required: 0%
    The JSON files are well-structured and do not contain any issues. Moving on to the next file.
5. examples/pytest/hypotheses_test_cases.json:1
  • Draft comment:
    The JSON structure looks good and well-formed.
  • Reason this comment was not posted:
    Confidence changes required: 0%
    The JSON files are well-structured and do not contain any issues. Moving on to the next file.
6. examples/pytest/requirements.txt:1
  • Draft comment:
    The requirements.txt file is correctly formatted.
  • Reason this comment was not posted:
    Confidence changes required: 0%
    The requirements.txt file is simple and correct. No issues found.
7. examples/pytest/some_actions.py:21
  • Draft comment:
    Ensure that the OpenAI client is correctly configured and the model name 'gpt-4o-mini' is valid.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The code in some_actions.py is mostly fine, but there is a potential issue with the OpenAI client instantiation. It should be checked if the client is correctly configured and if the model name is valid.

Workflow ID: wflow_0ceRxpohsZVovriJ


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@skrawcz skrawcz changed the title Adds sketch of how to use pytest Adds more thorough pytest example Dec 24, 2024
With link to mlflow.evaluate.
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 63e2f4e in 12 seconds

More details
  • Looked at 48 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. examples/pytest/README.md:69
  • Draft comment:
    Numbering error: The list numbering repeats '4'. Consider changing the second '4.' to '5.' for clarity.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The README file contains a minor typo in the numbering of the list under 'What kind of "asserts" do we want?'. The number 4 is repeated twice, which should be corrected for clarity.

Workflow ID: wflow_KDoSwz3G5ga4L3t8


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 8e820a3 in 31 seconds

More details
  • Looked at 48 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. docs/examples/guardrails/index.rst:2
  • Draft comment:
    The title change from "Guardrails" to "Guardrails / Tests" is not aligned with the content of the document, which focuses on creating test cases. Consider reverting to the original title or ensuring the content matches the broader scope.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    This is a documentation change. The comment speculates about content mismatch without having access to the actual content of the creating_tests file. Since this is an index file that includes a section about tests, adding "Tests" to the title seems reasonable. Documentation organization is subjective and should be left to the author's discretion unless there's clear evidence of a problem.
    I haven't seen the content of creating_tests.rst, so I can't be certain about the document's focus. Maybe there's a real mismatch.
    Even without seeing creating_tests.rst, the toctree includes a tests-related file, so the title change appears justified. Documentation structure feedback should have strong evidence of problems.
    Delete this comment. It's speculative and questions a reasonable documentation organization choice without clear evidence of a problem.

Workflow ID: wflow_27ILJpSkYltLHrUy


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on b9a21d7 in 17 seconds

More details
  • Looked at 57 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. examples/pytest/README.md:17
  • Draft comment:
    Consider rephrasing for clarity:
An agent or augmented LLM is a combination of LLM calls and logic. But how do we know if it's working? We can test and evaluate it.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The README.md file in the examples/pytest directory contains a minor grammatical error in line 17. The sentence should be more concise and clear.

Workflow ID: wflow_FzcITiRUWE42mLjl


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 84c5b0d in 12 seconds

More details
  • Looked at 58 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. examples/pytest/README.md:11
  • Draft comment:
    Typographical error in list numbering. The second '4.' should be '5.' for correct sequential numbering.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The README.md file in the examples/pytest directory contains a minor typographical error in the list numbering. The second '4.' should be '5.' to maintain correct sequential numbering.

Workflow ID: wflow_nupTErg5VyCjx1Pu


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on ee7a2e2 in 15 seconds

More details
  • Looked at 69 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. examples/pytest/README.md:17
  • Draft comment:
    Consider adding a comma after "Well" for clarity.
Well, we can test & evaluate it.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The README.md file in the examples/pytest directory contains a minor grammatical error in line 17. The sentence "But how do we know if it's working? Well we can test & evaluate it." would be clearer with a comma after "Well".

Workflow ID: wflow_IOF6jRtyHbxge8pQ


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 354fa86 in 16 seconds

More details
  • Looked at 36 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 drafted comments based on config settings.
1. examples/pytest/some_actions.py:23
  • Draft comment:
    The f-string is unnecessary here as there are no variables to interpolate. Consider using a regular string.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The prompt string concatenation is correct, but the f-string is unnecessary since there are no variables to interpolate.
2. examples/pytest/some_actions.py:26
  • Draft comment:
    The f-string is unnecessary here as there are no variables to interpolate. Consider using a regular string.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The prompt string concatenation is correct, but the f-string is unnecessary since there are no variables to interpolate.
3. examples/pytest/some_actions.py:27
  • Draft comment:
    The f-string is unnecessary here as there are no variables to interpolate. Consider using a regular string.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The prompt string concatenation is correct, but the f-string is unnecessary since there are no variables to interpolate.

Workflow ID: wflow_EPay11lueIPsg1Kk


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 4838d64 in 11 seconds

More details
  • Looked at 14 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_Rq5cXafnqgYOhJv3


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on fb960cb in 12 seconds

More details
  • Looked at 15 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. examples/pytest/README.md:86
  • Draft comment:
    The list numbering is incorrect. There are two items labeled as '4.' in the list of evaluation methods. Please correct the numbering.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The README.md file contains a minor issue with the numbering of the list items. There are two items labeled as '4.' in the list of evaluation methods. This issue is not in the changed lines, so I cannot comment directly on it.

Workflow ID: wflow_dgdFco9ZJ3J1jWkZ


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@skrawcz skrawcz merged commit e149616 into main Dec 27, 2024
11 checks passed
@skrawcz skrawcz deleted the add_pytest_example branch December 27, 2024 06:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants