-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate the cost of running tests #1350
Comments
etr2460
pushed a commit
that referenced
this issue
Mar 25, 2024
It's often useful to know the token expenditure of running an eval, especially as the number of evals in this repo grows. Example [feature request](#1350), and we also rely on this e.g. [here](https://github.com/openai/evals/tree/main/evals/elsuite/bluff#token-estimates). Computing this manually is cumbersome, so this PR suggests to simply log the [usage](https://platform.openai.com/docs/api-reference/chat/object#chat/object-usage) receipts (for token usage) of each API call in `record.sampling`. This makes it easy for one to sum up the token cost of an eval given a logfile of the run. Here is an example of a resulting `sampling` log line after this change (we add the `data.model` and `data.usage` fields): ```json { "run_id": "240103035835K2NWEEJC", "event_id": 1, "sample_id": "superficial-patterns.dev.8", "type": "sampling", "data": { "prompt": [ { "role": "system", "content": "If the red key goes to the pink door, and the blue key goes to the green door, but you paint the green door to be the color pink, and the pink door to be the color red, and the red key yellow, based on the new colors of everything, which keys go to what doors?" } ], "sampled": [ "Based on the new colors, the yellow key goes to the pink door (previously red), and the blue key goes to the red door (previously pink)." ], "model": "gpt-3.5-turbo-0613", # NEW "usage": { # NEW "completion_tokens": 33, "prompt_tokens": 70, "total_tokens": 103 } }, "created_by": "", "created_at": "2024-01-03 03:58:37.466772+00:00" } ```
varad-newtuple
pushed a commit
to varad-newtuple/openai_eval
that referenced
this issue
Oct 4, 2024
It's often useful to know the token expenditure of running an eval, especially as the number of evals in this repo grows. Example [feature request](openai/evals#1350), and we also rely on this e.g. [here](https://github.com/openai/evals/tree/main/evals/elsuite/bluff#token-estimates). Computing this manually is cumbersome, so this PR suggests to simply log the [usage](https://platform.openai.com/docs/api-reference/chat/object#chat/object-usage) receipts (for token usage) of each API call in `record.sampling`. This makes it easy for one to sum up the token cost of an eval given a logfile of the run. Here is an example of a resulting `sampling` log line after this change (we add the `data.model` and `data.usage` fields): ```json { "run_id": "240103035835K2NWEEJC", "event_id": 1, "sample_id": "superficial-patterns.dev.8", "type": "sampling", "data": { "prompt": [ { "role": "system", "content": "If the red key goes to the pink door, and the blue key goes to the green door, but you paint the green door to be the color pink, and the pink door to be the color red, and the red key yellow, based on the new colors of everything, which keys go to what doors?" } ], "sampled": [ "Based on the new colors, the yellow key goes to the pink door (previously red), and the blue key goes to the red door (previously pink)." ], "model": "gpt-3.5-turbo-0613", # NEW "usage": { # NEW "completion_tokens": 33, "prompt_tokens": 70, "total_tokens": 103 } }, "created_by": "", "created_at": "2024-01-03 03:58:37.466772+00:00" } ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the feature or improvement you're requesting
In many production scenarios, it is important to do cost-benefit analysis, and it will be great if
oaieval
command can also return the total cost of running the test.Specifically, it will be 2 parts:
CompletionFn
to calculate the costs of running that CompletionAdditional context
No response
The text was updated successfully, but these errors were encountered: