Adds more thorough pytest example #442

skrawcz · 2024-11-30T00:12:15Z

This shows how one can use pytest and Burr's functionality.

Changes

adds example

How I tested this

runs locally

Notes

README will turn into blog post

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep)
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Placeholder code is flagged / future TODOs are captured in comments
Project documentation has been updated if adding/changing functionality.

Important

Adds comprehensive pytest examples and documentation for testing agents and LLM applications with Burr.

Examples:
- Adds README.md in examples/pytest detailing pytest usage for testing agents and LLM applications.
- Introduces some_actions.py with example actions for an augmented LLM application.
- Adds test_some_actions.py with tests for actions in some_actions.py using pytest and pytest-harvest.
Test Cases:
- Adds e2e_test_cases.json and hypotheses_test_cases.json for parameterized testing with Burr.
Fixtures and Utilities:
- Adds conftest.py with a custom ResultCollector fixture for collecting test results.
- Updates requirements.txt to include burr, pytest, and pytest-harvest.
Validation:
- Updates validate_examples.py to include pytest in the filter list for validation.
Documentation:
- Updates creating_tests.rst and index.rst to reference new pytest examples.

^{This description was created by}^{for fb960cb. It will automatically update as commits are pushed.}

github-actions · 2024-11-30T00:14:00Z

A preview of fb960cb is uploaded and can be seen here:

✨ https://burr.dagworks.io/pull/442 ✨

Changes may take a few minutes to propagate. Since this is a preview of production, content with draft: true will not be rendered. The source is here: https://github.com/DAGWorks-Inc/burr/tree/gh-pages/pull/442/

skrawcz · 2024-12-04T07:25:55Z

TODO: show this:

E.g. run pytest, get dataframe of results (and do something with it), but also see traces for each of those runs in the Burr UI.

This is a WIP. This shows how one might log things they don't want to fail on to a results_bag and then access them in a dataframe...

This shows how to use pytest to test an action. TODO: - how to use burr fixture - how to test agent and use tracker

ellipsis-dev

👍 Looks good to me! Reviewed everything up to 0520ad8 in 47 seconds

More details

Looked at 779 lines of code in 8 files
Skipped 1 files when reviewing.
Skipped posting 7 drafted comments based on config settings.

1. examples/pytest/README.md:71

Draft comment:
Typo: 'acheive' should be 'achieve'.
Reason this comment was not posted:
Confidence changes required: 10%
The README.md file contains a typo in the word 'acheive'. It should be corrected to 'achieve'.

2. examples/pytest/README.md:207

Draft comment:
Typo: 'parameterizeable' should be 'parameterizable'.
Reason this comment was not posted:
Confidence changes required: 10%
The README.md file contains a typo in the word 'parameterizeable'. It should be corrected to 'parameterizable'.

3. examples/pytest/README.md:67

Draft comment:
Typo: 'walkthrough' should be 'walk through'.
Reason this comment was not posted:
Confidence changes required: 10%
The README.md file contains a typo in the word 'walkthrough'. It should be corrected to 'walk through'.

4. examples/pytest/e2e_test_cases.json:1

Draft comment:
The JSON structure looks good and well-formed.
Reason this comment was not posted:
Confidence changes required: 0%
The JSON files are well-structured and do not contain any issues. Moving on to the next file.

5. examples/pytest/hypotheses_test_cases.json:1

Draft comment:
The JSON structure looks good and well-formed.
Reason this comment was not posted:
Confidence changes required: 0%
The JSON files are well-structured and do not contain any issues. Moving on to the next file.

6. examples/pytest/requirements.txt:1

Draft comment:
The requirements.txt file is correctly formatted.
Reason this comment was not posted:
Confidence changes required: 0%
The requirements.txt file is simple and correct. No issues found.

7. examples/pytest/some_actions.py:21

Draft comment:
Ensure that the OpenAI client is correctly configured and the model name 'gpt-4o-mini' is valid.
Reason this comment was not posted:
Confidence changes required: 50%
The code in some_actions.py is mostly fine, but there is a potential issue with the OpenAI client instantiation. It should be checked if the client is correctly configured and if the model name is valid.

Workflow ID: wflow_0ceRxpohsZVovriJ

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

With link to mlflow.evaluate.

ellipsis-dev

👍 Looks good to me! Incremental review on 63e2f4e in 12 seconds

More details

Looked at 48 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. examples/pytest/README.md:69

Draft comment:
Numbering error: The list numbering repeats '4'. Consider changing the second '4.' to '5.' for clarity.
Reason this comment was not posted:
Confidence changes required: 10%
The README file contains a minor typo in the numbering of the list under 'What kind of "asserts" do we want?'. The number 4 is repeated twice, which should be corrected for clarity.

Workflow ID: wflow_KDoSwz3G5ga4L3t8

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 8e820a3 in 31 seconds

More details

Looked at 48 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. docs/examples/guardrails/index.rst:2

Draft comment:
The title change from "Guardrails" to "Guardrails / Tests" is not aligned with the content of the document, which focuses on creating test cases. Consider reverting to the original title or ensuring the content matches the broader scope.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
This is a documentation change. The comment speculates about content mismatch without having access to the actual content of the creating_tests file. Since this is an index file that includes a section about tests, adding "Tests" to the title seems reasonable. Documentation organization is subjective and should be left to the author's discretion unless there's clear evidence of a problem.
I haven't seen the content of creating_tests.rst, so I can't be certain about the document's focus. Maybe there's a real mismatch.
Even without seeing creating_tests.rst, the toctree includes a tests-related file, so the title change appears justified. Documentation structure feedback should have strong evidence of problems.
Delete this comment. It's speculative and questions a reasonable documentation organization choice without clear evidence of a problem.

Workflow ID: wflow_27ILJpSkYltLHrUy

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on b9a21d7 in 17 seconds

More details

Looked at 57 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. examples/pytest/README.md:17

Draft comment:
Consider rephrasing for clarity:

An agent or augmented LLM is a combination of LLM calls and logic. But how do we know if it's working? We can test and evaluate it.

Reason this comment was not posted:
Confidence changes required: 10%
The README.md file in the examples/pytest directory contains a minor grammatical error in line 17. The sentence should be more concise and clear.

Workflow ID: wflow_FzcITiRUWE42mLjl

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 84c5b0d in 12 seconds

More details

Looked at 58 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. examples/pytest/README.md:11

Draft comment:
Typographical error in list numbering. The second '4.' should be '5.' for correct sequential numbering.
Reason this comment was not posted:
Confidence changes required: 10%
The README.md file in the examples/pytest directory contains a minor typographical error in the list numbering. The second '4.' should be '5.' to maintain correct sequential numbering.

Workflow ID: wflow_nupTErg5VyCjx1Pu

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on ee7a2e2 in 15 seconds

More details

Looked at 69 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. examples/pytest/README.md:17

Draft comment:
Consider adding a comma after "Well" for clarity.

Well, we can test & evaluate it.

Reason this comment was not posted:
Confidence changes required: 10%
The README.md file in the examples/pytest directory contains a minor grammatical error in line 17. The sentence "But how do we know if it's working? Well we can test & evaluate it." would be clearer with a comma after "Well".

Workflow ID: wflow_IOF6jRtyHbxge8pQ

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 354fa86 in 16 seconds

More details

Looked at 36 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 drafted comments based on config settings.

1. examples/pytest/some_actions.py:23

Draft comment:
The f-string is unnecessary here as there are no variables to interpolate. Consider using a regular string.
Reason this comment was not posted:
Confidence changes required: 50%
The prompt string concatenation is correct, but the f-string is unnecessary since there are no variables to interpolate.

2. examples/pytest/some_actions.py:26

Draft comment:
The f-string is unnecessary here as there are no variables to interpolate. Consider using a regular string.
Reason this comment was not posted:
Confidence changes required: 50%
The prompt string concatenation is correct, but the f-string is unnecessary since there are no variables to interpolate.

3. examples/pytest/some_actions.py:27

Draft comment:
The f-string is unnecessary here as there are no variables to interpolate. Consider using a regular string.
Reason this comment was not posted:
Confidence changes required: 50%
The prompt string concatenation is correct, but the f-string is unnecessary since there are no variables to interpolate.

Workflow ID: wflow_EPay11lueIPsg1Kk

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 4838d64 in 11 seconds

More details

Looked at 14 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_Rq5cXafnqgYOhJv3

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on fb960cb in 12 seconds

More details

Looked at 15 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. examples/pytest/README.md:86

Draft comment:
The list numbering is incorrect. There are two items labeled as '4.' in the list of evaluation methods. Please correct the numbering.
Reason this comment was not posted:
Confidence changes required: 10%
The README.md file contains a minor issue with the numbering of the list items. There are two items labeled as '4.' in the list of evaluation methods. This issue is not in the changed lines, so I cannot comment directly on it.

Workflow ID: wflow_dgdFco9ZJ3J1jWkZ

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

skrawcz force-pushed the add_pytest_example branch from 3ced84f to a792759 Compare December 4, 2024 07:21

skrawcz added 8 commits December 24, 2024 12:53

Adds sketch of how to use pytest

06f05bb

This is a WIP. This shows how one might log things they don't want to fail on to a results_bag and then access them in a dataframe...

Remove file

406aba4

Adds more to pytest README

027de99

Updates README

cf080ea

Adds example to test

f128411

Adds some tests

6027774

This shows how to use pytest to test an action. TODO: - how to use burr fixture - how to test agent and use tracker

Fixes test to not complain

38726a7

Add file for parameterization for pytest example

3b2500f

skrawcz force-pushed the add_pytest_example branch from 01c7e7c to 3b2500f Compare December 24, 2024 20:54

Completes pytest example

0520ad8

skrawcz marked this pull request as ready for review December 24, 2024 22:11

ellipsis-dev bot reviewed Dec 24, 2024

View reviewed changes

skrawcz changed the title ~~Adds sketch of how to use pytest~~ Adds more thorough pytest example Dec 24, 2024

Adds more clarification on what to evaluate

63e2f4e

With link to mlflow.evaluate.

ellipsis-dev bot reviewed Dec 26, 2024

View reviewed changes

skrawcz requested a review from elijahbenizzy December 26, 2024 23:09

Adds docs reference for pytest example

8e820a3

ellipsis-dev bot reviewed Dec 26, 2024

View reviewed changes

ellipsis-dev bot reviewed Dec 27, 2024

View reviewed changes

skrawcz force-pushed the add_pytest_example branch from b9a21d7 to 84c5b0d Compare December 27, 2024 00:33

ellipsis-dev bot reviewed Dec 27, 2024

View reviewed changes

Adds image and SDLC section in overview docs.

ee7a2e2

skrawcz force-pushed the add_pytest_example branch from 84c5b0d to ee7a2e2 Compare December 27, 2024 00:37

ellipsis-dev bot reviewed Dec 27, 2024

View reviewed changes

Fixing some typos

354fa86

ellipsis-dev bot reviewed Dec 27, 2024

View reviewed changes

Adds UI screencap to pytest example

4838d64

ellipsis-dev bot reviewed Dec 27, 2024

View reviewed changes

Adds link to Hamel's post on figuring out metrics for evals

fb960cb

ellipsis-dev bot reviewed Dec 27, 2024

View reviewed changes

elijahbenizzy approved these changes Dec 27, 2024

View reviewed changes

skrawcz merged commit e149616 into main Dec 27, 2024
11 checks passed

skrawcz deleted the add_pytest_example branch December 27, 2024 06:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds more thorough pytest example #442

Adds more thorough pytest example #442

skrawcz commented Nov 30, 2024 •

edited by ellipsis-dev bot

Loading

github-actions bot commented Nov 30, 2024 •

edited

Loading

skrawcz commented Dec 4, 2024

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

Adds more thorough pytest example #442

Adds more thorough pytest example #442

Conversation

skrawcz commented Nov 30, 2024 • edited by ellipsis-dev bot Loading

Changes

How I tested this

Notes

Checklist

github-actions bot commented Nov 30, 2024 • edited Loading

skrawcz commented Dec 4, 2024

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

skrawcz commented Nov 30, 2024 •

edited by ellipsis-dev bot

Loading

github-actions bot commented Nov 30, 2024 •

edited

Loading