Arxiv preprint | DSPy Implementation
AvaTaR is a novel and automatic framework that optimizes an LLM agent to effectively use the provided tools and improve its performance on a given task/domain. During optimization, we design a comparator module to iteratively provide insightful and holistic prompts to the LLM agent via reasoning between positive and negative examples sampled from training data.
[July 2024] 🔥 Avatar is integrated into DSPy - Credit to Herumb Shandilya! You can try out the example on jupyter notebook.
Avatar is now integrated with DSPy as Avatar
Module for agent execution and AvatarOptimizer
for Actor optimization. To use Avatar you'll need: Task Signature and Tools.
- Task Signature is a
dspy.Signature
class defining the structure of your task. So if your task is of QA type you can create a signature withquestion
input field andanswer
output field. - Tools is a list of
dspy.Tools
containing all the tools of langchain tool format.
Here is an example
from dspy.predict.avatar import Tool, Avatar
from langchain_community.utilities import GoogleSerperAPIWrapper, ArxivAPIWrapper
tools = [
Tool(
tool=GoogleSerperAPIWrapper(),
name="WEB_SEARCH",
desc="If you have a question, you can use this tool to search the web for the answer."
),
]
agent = Avatar(
tools=tools,
signature="question->answer",
verbose=True,
)
You can execute it like any other DSPy module by passing the inputs you specified in your task signature:
answer = agent(question)
You can optimize the Actor for optimal tool usage using AvatarOptimizer
which optimizes it using the comparator module:
from dspy.teleprompt import AvatarOptimizer
def metric(example, prediction, trace=None):
...
teleprompter = AvatarOptimizer(
metric=metric,
max_iters=10,
max_negative_inputs=10,
max_positive_inputs=10,
)
optimized_arxiv_agent = teleprompter.compile(
student=agent,
trainset=trainset
)
For a detailed walkthrough, you can refer to the notebook in DSPy repo.
conda create -n avatar python=3.11
pip install stark-qa typeguard
- Specify API keys in command line
export ANTHROPIC_API_KEY=YOUR_API_KEY
export OPENAI_API_KEY=YOUR_API_KEY export OPENAI_ORG=YOUR_ORGANIZATION
- Embeddings: Download all embeddings by running the following script:
sh scripts/emb_download_all.sh
- Raw data:
STaRK data will be downloaded automatically when running the code.
For Flickr30k Entities, submit form at Flickr 30k & Denotation Graph data to request access. Then organize the data as follows:
data ├── flickr30k_entities │ ├── raw │ │ ├── Annotations │ │ │ ├── 36979.xml │ │ │ ├── ... │ │ ├── flickr30k-images │ │ ├── 36979.jpg │ │ ├── ... │ ├── split │ │ ├── test.index │ │ ├── train.index │ │ ├── val.index │ ├── qa.csv ├── ...
We already include the VSS results locally under output/eval
and the grouping (for STaRK only) under output/agent
. With these files, you should be able to optimize actor actions directly following the AvaTaR pipeline.
- Optimization: Following the default settings at
config/default_args.json
, run the following command to optimize the actor actions for a group of queries:You can specify the dataset name and group insh scripts/run_avatar_stark.sh
scripts/run_avatar_stark.sh
.sh scripts/run_avatar_flickr30k_entities.sh
- Evaluation: Run the following command to evaluate the optimized actor actions:
or
sh scripts/eval_avatar_stark.sh
sh scripts/eval_avatar_flickr30k_entities.sh
We provide the implementation of ReAct baseline on STaRK and Flickr-30kEntities. The function lists provided to ReAct are under avatar/tools/react
.
- Evaluation: Run the following command to evaluate ReAct:
or
sh scripts/eval_react_stark.sh
sh scripts/eval_react_flickr30k_entities.sh
By default, we store the logs of ReAct reasoning and acting process at logs/
.
@article{wu24avatar,
title = {AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning},
author = {
Shirley Wu and Shiyu Zhao and
Qian Huang and Kexin Huang and
Michihiro Yasunaga and Kaidi Cao and
Vassilis N. Ioannidis and Karthik Subbian and
Jure Leskove and James Zou
},
booktitle = {NeurIPS},
year = {2024}
}