add support for run-bug-run runbugrun #39

monperrus · 2023-04-04T16:00:02Z

RunBugRun -- An Executable Dataset for Automated Program Repair
https://github.com/giganticode/run_bug_run

andre15silva · 2023-04-13T13:33:50Z

https://github.com/giganticode/run_bug_run_data/releases/tag/v0.0.1

Seems like the first release is out

cadddr · 2024-10-22T21:24:51Z

happy to take on that

andre15silva · 2024-10-23T07:05:48Z

happy to take on that

sounds good, let me know if you have any question!

cadddr · 2024-10-23T18:35:12Z

happy to take on that

sounds good, let me know if you have any question!

I started looking into this yesterday. Few things about run bug run:

it comes with its own tool for managing bugs data and execution - rbugr, written in ruby. I find it easier to work with jsonl files directly, added download commands to setup script.
when implementing benchmark and bug subclasses for python bugs, I realized elle-elle-aime is hardwired for java, is it ok to carry on with python? I have not worked with run bug run - Java.
run bug run programs receive inputs via standard input rather than function arguments and print results back to it rather than returning. I made a utility (for Python) that programmatically pipes input and captures output into a variable.

andre15silva · 2024-10-24T07:37:20Z

when implementing benchmark and bug subclasses for python bugs, I realized elle-elle-aime is hardwired for java, is it ok to carry on with python? I have not worked with run bug run - Java.

When you integrate the benchmark you define which commands to run when compiling/testing each bug/patch.

The only part that is currently hard-coded for Java is the extraction of single functions, removal of comments, etc.
These are used when generating prompts.

See https://github.com/ASSERT-KTH/elle-elle-aime/tree/master/elleelleaime/core/utils/java

To integrate a Python benchmark you'll need to implement similar functions for Python (or even better, using tree-sitter to support more languages).

cadddr · 2024-10-30T01:34:56Z

Sharing progress so far: #166
(Please don't merge as it is still missing a few things.)

I'm a little unclear on the Bug.failing_tests -- it maps test methods to the resulting error message? In run bug run there are simply test inputs and expected outputs, and the buggy code is not always a self-contained function.

Also the ground_truth diff only comes into play when evaluating the LLM-generated fix? Why not simply check if tests pass.

Similarly, not sure if I'm using the checkout logic correctly -- seems like a drag to have to make a copy each time and I instead simply read from the original buggy file.

Any feedback/corrections welcome!

andre15silva · 2024-10-30T08:58:52Z

I'm a little unclear on the Bug.failing_tests -- it maps test methods to the resulting error message?

Exactly, it maps fully qualified test method names to the error messages.

In run bug run there are simply test inputs and expected outputs, and the buggy code is not always a self-contained function.

I see the solution you came up with, and that seems reasonable.

The only problem will be in extracting the test case (see https://github.com/ASSERT-KTH/elle-elle-aime/blob/master/elleelleaime/core/utils/java/java.py#L269). This means that we need to add a special case for RunBugRun here.

Also the ground_truth diff only comes into play when evaluating the LLM-generated fix?

The ground_truth diff is used in two places right now:

In extracting the buggy function (see https://github.com/ASSERT-KTH/elle-elle-aime/blob/master/elleelleaime/core/utils/java/java.py#L143), during the generation of prompts
In evaluating the generated fixed functions.

Why not simply check if tests pass.

Executing tests to check is great, but there is known problem in program repair called patch overfitting. This problem lies in patches that pass the tests but are different from what the developer intends (see e.g., Is the cure worse than the disease? overfitting in automated program repair.

For this reason, we use the ground-truth patch as a reference in some evaluation metrics like exact-match or ast-match.

Similarly, not sure if I'm using the checkout logic correctly -- seems like a drag to have to make a copy each time and I instead simply read from the original buggy file.

It's important to have that logic (every checkout copies the files from an untouched source) due to the parallelism. We want to be able to evaluate hundreds/thousands of patches at the same time, and this requires them to be in different locations.

Any feedback/corrections welcome!

Could you rebase your PR? I changed the CI config to enable it on PRs. That way we can check if the tests are green. Thanks :)

cadddr · 2024-11-04T19:13:41Z

The only problem will be in extracting the test case (see https://github.com/ASSERT-KTH/elle-elle-aime/blob/master/elleelleaime/core/utils/java/java.py#L269). This means that we need to add a special case for RunBugRun here.

The ground_truth diff is used in two places right now:

In extracting the buggy function (see https://github.com/ASSERT-KTH/elle-elle-aime/blob/master/elleelleaime/core/utils/java/java.py#L143), during the generation of prompts

So, test cases right now are simple asserts about the returned value.
I've overridden the instruct strategy for python here, to circumvent having to extract test case source:

febe8e4#diff-3f4ea3e207b6866ea3514390ef0148073207b05d1a8ca4da933d8f926e1be2d5

Got all the other points, will rebase PR.

andre15silva · 2024-11-05T12:47:55Z

Looks like a good solution, thanks!

Let me know if you have any problem with the CI

monperrus changed the title ~~add support for run-run-bugs~~ add support for run-bug-run runbugrun May 3, 2023

cadddr mentioned this issue Oct 30, 2024

add support for run-bug-run runbugrun #39 WIP #166

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for run-bug-run runbugrun #39

add support for run-bug-run runbugrun #39

monperrus commented Apr 4, 2023

andre15silva commented Apr 13, 2023

cadddr commented Oct 22, 2024

andre15silva commented Oct 23, 2024

cadddr commented Oct 23, 2024

andre15silva commented Oct 24, 2024

cadddr commented Oct 30, 2024 •

edited

Loading

andre15silva commented Oct 30, 2024

cadddr commented Nov 4, 2024

andre15silva commented Nov 5, 2024

add support for run-bug-run runbugrun #39

add support for run-bug-run runbugrun #39

Comments

monperrus commented Apr 4, 2023

andre15silva commented Apr 13, 2023

cadddr commented Oct 22, 2024

andre15silva commented Oct 23, 2024

cadddr commented Oct 23, 2024

andre15silva commented Oct 24, 2024

cadddr commented Oct 30, 2024 • edited Loading

andre15silva commented Oct 30, 2024

cadddr commented Nov 4, 2024

andre15silva commented Nov 5, 2024

cadddr commented Oct 30, 2024 •

edited

Loading