Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MM] Chartqa #2544

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions lm_eval/tasks/chartqa/chartqa.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
dataset_path: MMMU/MMMU
validation_split: validation
output_type: generate_until
doc_to_image: image
doc_to_text: |
{{question}}
Analyze the image and question carefully, using step-by-step reasoning.
First, describe any image provided in detail. Then, present your reasoning. And finally your final answer in this format:
Final Answer: <answer>
where <answer> follows the following instructions:
- <answer> should should be a single phrase or number.
- <answer> should not paraphrase or reformat the text in the image.
- If <answer> is a ratio, it should be a decimal value like 0.25 instead of 1:4.
- If the question is a Yes/No question, <answer> should be Yes/No.
- If <answer> is a number, it should not contain any units.
- If <answer> is a percentage, it should include a % sign.
- If <answer> is an entity, it should include the full label from the graph.
IMPORTANT: Remember, to end your answer with Final Answer: <answer>.
doc_to_target: "{{ label[0] }}"
#process_results: !function utils.process_results
generation_kwargs:
until: []
temperature: 0.0
do_sample: false
max_gen_toks: 512
filter_list:
- name: "strict-match"
filter:
- function: "regex"
regex_pattern: "Final Answer (\\-?[0-9\\.\\,]+).*"
- function: "take_first"
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
metadata:
version: 0.0
Loading