Skip to content

aorwall/moatless-tree-search

Repository files navigation

Moatless Tree Search

Note: The original development code can be found at github.com/a-antoniades/swe-search. It is only intended for reproducing the results in the paper. This is a clean refactor with a modular design, which will be maintained and extended.

License arXiv Streamlit YouTube Twitter Discord

Method Diagram

Overview of SWE-Search showing the tree search process, where states (nodes) and actions (edges) are evaluated using contextual information and value function feedback to guide expansion.

Installation

Install the package:

pip install moatless-tree-search

Environment Setup

Before running the evaluation, you'll need:

  1. At least one LLM provider API key (e.g., OpenAI, Anthropic, etc.)
  2. A Voyage AI API key from voyageai.com to use the pre-embedded vector stores for SWE-Bench instances.
  3. (Optional) Access to a testbed environment - see moatless-testbeds for setup instructions

You can configure these settings by either:

  1. Create a .env file in the project root (copy from .env.example):

    cp .env.example .env
    # Edit .env with your values
  2. Or export the variables directly:

    # Directory for storing vector index store files  
    export INDEX_STORE_DIR="/tmp/index_store"    
    
    # Directory for storing clonedrepositories 
    export REPO_DIR="/tmp/repos"
    
    # Required: At least one LLM provider API key
    export OPENAI_API_KEY="<your-key>"
    export ANTHROPIC_API_KEY="<your-key>"
    export HUGGINGFACE_API_KEY="<your-key>"
    export DEEPSEEK_API_KEY="<your-key>"
    
    # ...or Base URL for custom LLM API service (optional)
    export CUSTOM_LLM_API_BASE="<your-base-url>"
    export CUSTOM_LLM_API_KEY="<your-key>"
    
    # Required: API Key for Voyage Embeddings
    export VOYAGE_API_KEY="<your-key>"
    
    # Optional: Configuration for testbed environment (https://github.com/aorwall/moatless-testbeds)
    export TESTBED_API_KEY="<your-key>"
    export TESTBED_BASE_URL="<your-base-url>"

Streamlit

To launch the Streamlit app, run:

# Launch with direct file loading
moatless-streamlit path/to/trajectory.json

# Launch interactive UI (file can be selected in browser)
moatless-streamlit

The following badges are used to indicate the status of a node:

Badge Shape Color Description
Star Green Node is marked as resolved
X Red Invalid edits or failed tests
🟢 Circle Green Correct code spans present in the context
🟡 Circle Yellow Either:
• Found files but not spans
• Found spans but in wrong files

Evaluation

To run the evaluation script

moatless-evaluate \
    --model "gpt-4o-mini" \
    --repo_base_dir /tmp/repos \
    --eval_dir "./evaluations" \
    --eval_name mts \
    --temp 0.7 \
    --num_workers 1 \
    --use_testbed \
    --feedback \
    --max_iterations 100 \
    --max_expansions 5

You can optionally set the --instance_ids to evaluate on a specific instance or a list of instances.

Use --use_testbed if you got access to a testbed environment. Otherwise, tests will not be run.

Examples

Example: Basic Flow

Basic setup similar to the moatless-tools agent.

from moatless.agent import CodingAgent
from moatless.agent.code_prompts import SIMPLE_CODE_PROMPT
from moatless.benchmark.swebench import create_repository
from moatless.benchmark.utils import get_moatless_instance
from moatless.completion import CompletionModel
from moatless.file_context import FileContext
from moatless.index import CodeIndex
from moatless.search_tree import SearchTree
from moatless.actions import FindClass, FindFunction, FindCodeSnippet, SemanticSearch, RequestMoreContext, RequestCodeChange, Finish, Reject

index_store_dir = "/tmp/index_store"
repo_base_dir = "/tmp/repos"
persist_path = "trajectory.json"

instance = get_moatless_instance("django__django-16379")

completion_model = CompletionModel(model="gpt-4o", temperature=0.0)

repository = create_repository(instance)

code_index = CodeIndex.from_index_name(
    instance["instance_id"], index_store_dir=index_store_dir, file_repo=repository
)

actions = [
    FindClass(code_index=code_index, repository=repository),
    FindFunction(code_index=code_index, repository=repository),
    FindCodeSnippet(code_index=code_index, repository=repository),
    SemanticSearch(code_index=code_index, repository=repository),
    RequestMoreContext(repository=repository),
    RequestCodeChange(repository=repository, completion_model=completion_model),
    Finish(),
    Reject()
]

file_context = FileContext(repo=repository)
agent = CodingAgent(actions=actions, completion=completion_model, system_prompt=SIMPLE_CODE_PROMPT)

search_tree = SearchTree.create(
    message=instance["problem_statement"],
    agent=agent,
    file_context=file_context,
    max_expansions=1,
    max_iterations=50
)

node = search_tree.run_search()
print(node.observation.message)

Example: MCTS Flow

How to setup the evaluation flow with MCTS and testbeds.

from moatless.agent import CodingAgent
from moatless.benchmark.swebench import create_repository
from moatless.benchmark.utils import get_moatless_instance
from moatless.completion import CompletionModel
from moatless.discriminator import AgentDiscriminator
from moatless.feedback import FeedbackGenerator
from moatless.file_context import FileContext
from moatless.index import CodeIndex
from moatless.search_tree import SearchTree
from moatless.selector import BestFirstSelector
from moatless.actions import FindClass, FindFunction, FindCodeSnippet, SemanticSearch, RequestMoreContext, RequestCodeChange, Finish, Reject, RunTests
from moatless.value_function import ValueFunction
from testbeds.sdk import TestbedSDK
from moatless.runtime.testbed import TestbedEnvironment

index_store_dir = "/tmp/index_store"
repo_base_dir = "/tmp/repos"
persist_path = "trajectory.json"

instance = get_moatless_instance("django__django-16379")

completion_model = CompletionModel(model="gpt-4o-mini", temperature=0.7)

repository = create_repository(instance, repo_base_dir=repo_base_dir)

code_index = CodeIndex.from_index_name(
    instance["instance_id"], index_store_dir=index_store_dir, file_repo=repository
)

file_context = FileContext(repo=repository)

selector = BestFirstSelector()

value_function = ValueFunction(completion=completion_model)

discriminator = AgentDiscriminator(
    completion=completion_model,
    n_agents=5,
    n_rounds=3,
)

feedback = FeedbackGenerator()

runtime = TestbedEnvironment(
    testbed_sdk=TestbedSDK(),
    repository=repository,
    instance=instance
)

actions = [
    FindClass(code_index=code_index, repository=repository),
    FindFunction(code_index=code_index, repository=repository),
    FindCodeSnippet(code_index=code_index, repository=repository),
    SemanticSearch(code_index=code_index, repository=repository),
    RequestMoreContext(repository=repository),
    RequestCodeChange(repository=repository, completion_model=completion_model),
    RunTests(code_index=code_index, repository=repository, runtime=runtime),
    Finish(),
    Reject()
]

agent = CodingAgent(actions=actions, completion=completion_model)

search_tree = SearchTree.create(
    message=instance["problem_statement"],
    agent=agent,
    file_context=file_context,
    selector=selector,
    value_function=value_function,
    discriminator=discriminator,
    feedback_generator=feedback,
    max_iterations=100,
    max_expansions=3,
    max_depth=25,
    persist_path=persist_path,
)

node = search_tree.run_search()
print(node.observation.message)

Citation

@misc{antoniades2024swesearchenhancingsoftwareagents,
      title={SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement}, 
      author={Antonis Antoniades and Albert Örwall and Kexun Zhang and Yuxi Xie and Anirudh Goyal and William Wang},
      year={2024},
      eprint={2410.20285},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2410.20285}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages