Skip to content

Commit

Permalink
reverted local changes
Browse files Browse the repository at this point in the history
  • Loading branch information
changliu2 committed Nov 20, 2024
1 parent f428342 commit 03f29fb
Show file tree
Hide file tree
Showing 7 changed files with 33 additions and 57 deletions.
8 changes: 1 addition & 7 deletions src/api/evaluate/data/dataset_images.jsonl

Large diffs are not rendered by default.

14 changes: 3 additions & 11 deletions src/api/evaluate/eval_data.jsonl

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion src/api/evaluate/eval_results.jsonl

Large diffs are not rendered by default.

14 changes: 7 additions & 7 deletions src/api/evaluate/eval_results.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
| | groundedness.groundedness | groundedness.gpt_groundedness | relevance.relevance | relevance.gpt_relevance | fluency.fluency | fluency.gpt_fluency | coherence.coherence | coherence.gpt_coherence | violence.violence_defect_rate | hate_unfairness.hate_unfairness_defect_rate | self_harm.self_harm_defect_rate | sexual.sexual_defect_rate |
|---:|----------------------------:|--------------------------------:|----------------------:|--------------------------:|------------------:|----------------------:|----------------------:|--------------------------:|--------------------------------:|----------------------------------------------:|----------------------------------:|----------------------------:|
| 0 | 1 | 1 | 1 | 1 | 3.81818 | 3.81818 | 1 | 1 | 0 | 0 | 0 | 0 |
| | relevance.relevance | relevance.gpt_relevance | fluency.fluency | fluency.gpt_fluency | coherence.coherence | coherence.gpt_coherence | groundedness.groundedness | groundedness.gpt_groundedness | friendliness.score | violence.violence_defect_rate | hate_unfairness.hate_unfairness_defect_rate | self_harm.self_harm_defect_rate | sexual.sexual_defect_rate |
|---:|----------------------:|--------------------------:|------------------:|----------------------:|----------------------:|--------------------------:|----------------------------:|--------------------------------:|---------------------:|--------------------------------:|----------------------------------------------:|----------------------------------:|----------------------------:|
| 0 | 4.66667 | 4.66667 | 4.66667 | 4.66667 | 4.66667 | 4.66667 | 5 | 5 | 4 | 0 | 0 | 0 | 0 |

Averages scores:

| | 0 |
|:------------------------------|--------:|
| relevance.gpt_relevance | 1 |
| fluency.gpt_fluency | 3.81818 |
| coherence.gpt_coherence | 1 |
| groundedness.gpt_groundedness | 1 |
| relevance.gpt_relevance | 4.66667 |
| fluency.gpt_fluency | 4.66667 |
| coherence.gpt_coherence | 4.66667 |
| groundedness.gpt_groundedness | 5 |
36 changes: 13 additions & 23 deletions src/api/evaluate/evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,31 +146,20 @@ def evaluate_orchestrator(model_config, project_scope, data_path):
print(f"\n===== Creating articles to evaluate using data provided in {data_path}")
print("")
num_retries = 3
run_status = {}
for run_id in range(1):
print("repeated run", run_id+1)
error_df = []
with open(data_path) as f:
for num, line in enumerate(f):
row = json.loads(line)
data.append(row)
print(f"generating article {num +1}")
for try_id in range(num_retries):
try:
eval_data.append(run_orchestrator(row["research_context"], row["product_context"], row["assignment_context"]))
break
except Exception as e:
print("Agents failed to produce an article. Examine trace for details. Top layer error message:" + str(e) + f"\Retrying {try_id+1}/{num_retries} times.")
error_df += [pd.DataFrame({f"Try {try_id+1}": e}, index=f"Article {num+1}")]
continue
run_status[run_id+1] = pd.concat(error_df)

with open(data_path) as f:
for num, line in enumerate(f):
row = json.loads(line)
data.append(row)
print(f"generating article {num +1}")
for i in range(num_retries):
try:
eval_data.append(run_orchestrator(row["research_context"], row["product_context"], row["assignment_context"]))
break
except Exception as e:
print("Agents failed to produce an article. Examine trace for details. Error message:" + str(e) + f"\Retrying {i+1}/{num_retries} times.")
continue
end = time.time()
print(f"Agent finished writing articles in {end-start} seconds.")
run_status = pd.concat(run_status)
timestamp = time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
run_status.to_csv(f'./data/trace_{timestamp}.csv')

# write out eval data to a file so we can re-run evaluation on it
with jsonlines.open(folder + '/eval_data.jsonl', 'w') as writer:
for row in eval_data:
Expand Down Expand Up @@ -460,6 +449,7 @@ def make_image_message(url_path):

start=time.time()
print(f"Starting evaluate...")

eval_result = evaluate_orchestrator(model_config, project_scope, data_path=folder +"/eval_inputs.jsonl")
evaluate_remote(data_path=folder +"/eval_data.jsonl")

Expand Down
2 changes: 1 addition & 1 deletion src/api/evaluate/image_eval_results.jsonl

Large diffs are not rendered by default.

14 changes: 7 additions & 7 deletions src/api/evaluate/image_eval_results.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
Protected Material Prescence:


| | 0 |
|:----------------------------------------------|---------:|
| protected_material.fictional_characters_label | 0 |
| protected_material.logos_and_brands_label | 0.142857 |
| protected_material.artwork_label | 0.142857 |
| | 0 |
|:----------------------------------------------|----:|
| protected_material.fictional_characters_label | 1 |
| protected_material.logos_and_brands_label | 0 |
| protected_material.artwork_label | 0 |

Content Safety Scores:

| | 0 |
|:----------------------------------------------|----:|
| outputs.violence.violence_score | 0 |
| outputs.self_harm.self_harm_score | 0 |
| outputs.violence.violence_score | 4 |
| outputs.self_harm.self_harm_score | 4 |
| outputs.hate_unfairness.hate_unfairness_score | 0 |
| outputs.sexual.sexual_score | 0 |

0 comments on commit 03f29fb

Please sign in to comment.