This is the Redemption of False Positives project.
'Redemption' Automated Code Repair Tool Copyright 2023, 2024 Carnegie Mellon University. NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN 'AS-IS' BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT. Licensed under a MIT (SEI)-style license, please see License.txt or contact [email protected] for full terms. [DISTRIBUTION STATEMENT A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use and distribution. This Software includes and/or makes use of Third-Party Software each subject to its own license. DM23-2165The Redemption tool makes repairs to C/C++ source code based on alerts produced by certain static-analysis tools. Here is a screenshot of some repairs that the Redemption tool suggests:
For more details of the background and capabilities of the Redemption tool, see this presentation from the SEI Research Review 2023.
Redemption is currently able to identify alerts from the following static-analysis tools:
Tool | Version | License | Dockerfile | Original Container | Origin | Instructions |
---|---|---|---|---|---|---|
Clang-tidy | 16.0.6 | LLVM Release | Dockeffile.prereq | silkeh/clang:latest | Clang-tidy | Clang-tidy |
Cppcheck | 2.4.1 | GPL v3 | Dockerfile.codechecker | facthunder/cppcheck:latest | Cppcheck | Cppcheck |
Rosecheckers | CMU (OSS) | Dockerfile.rosecheckers | ghcr.io/cmu-sei/cert-rosecheckers/rosebud:latest | Rosecheckers | Rosecheckers |
For each tool, you can use the appropriate Dockerfile
to create a container. For example, to run Cppcheck, you can build the codechecker
container which will contain it:
docker build -f Dockerfile.codechecker -t ghcr.io/cmu-sei/redemption-codechecker .
Rather than build the container, you could pull the Original Container
as in:
docker run -it --rm facthunder/cppcheck:latest /bin/bash
Note that clang-tidy is part of the prereq
Redemption container, so it is already installed in the same container as Redemption.
You may run other tools, or create alerts manually; however Redemption has not been tested with alerts produced by other tools.
Redemption can currently repair the following categories of alerts. These alerts will often have a MITRE CWE number associated with them, or a rule in the SEI CERT C Coding Standard.
Category | CERT Rule ID | CWE ID |
---|---|---|
Null Pointer Dereference | EXP34-C | 476 |
Uninitialized Value Read | EXP33-C | 908 |
Ineffective Code | MSC12-C | 561 |
We hope to add more categories soon.
The repairs we make are very simple:
For a null pointer dereference, we use a null_check()
macro to protect any pointer that might be null. That is, we replace any expression x
that might represent a null pointer, with the value null_check(x)
. This macro is defined in acr.h. and it effectively invokes error-handling code if x
is null. Here x
can be any expression; it need not be a pointer variable. By default, the error-handling code aborts the program (via abort()
), but you can override it to have more specific behaviors, such as return NULL
, if you are inside a function that returns NULL on an error.
For an uninitialized value read, we simply go to the variable's declaration, and set it to 0 (or the equivalent of 0 for the variable's type...it could be 0.0 for floats for example).
For ineffective code, the simplest solution is to remove the portion of the statement or expression that is ineffective. For example, evaluating some expression, and assigning the result to a variable that is never read, such as:
x = foo(...);
can be repaired by preserving the expression and removing the assignment.
(void) foo(...);
(Casting a function's return value to (void)
is a common indication that the function's return value is to be ignored. See Exception 1 of CERT rule EXP12-C for more information.
Note that this code is ineffective only if the assignment can be proved not to invoke a C++ constructor, destructor, overloaded assignment operator or overloaded conversion or cast operators, which might have side effects.
Also note that Ineffective Code (MSC12-C) is a recommendation not a rule, and this presented additional complications, including the REPAIR_MSC12
environment variable that we currently use. For more information, see the Ineffective Code document, and the Environment Variables section of this document.
The code is designed to run inside a Docker container, which means you will need Docker. In particular, once you can run containers, you will need to share filesystems between your host and the various containers. For that you want the -v
switch in your docker run
command, its documentation is available here:
https://docs.docker.com/reference/cli/docker/container/run/#volume
We provide three useful containers for working with Redemption.
This image contains all the dependencies needed to run the repair tool, but it does not actually contain the repair tool itself. This container is useful if you wish to debug or extend the tool itself, as you can edit Redemption files on your host and immediately access them in the container.
To build this Docker container:
docker build -f Dockerfile.prereq -t ghcr.io/cmu-sei/redemption-prereq .
This command starts a Bash shell in the container: (Note that to run Redemption, you must make your Redemption folder available to the container...if you omit the -v
argument and its value, the container cannot access or run the repair code.)
docker run -it --rm -v ${PWD}:/host -w /host ghcr.io/cmu-sei/redemption-prereq bash
This image is just like the prereq
image, but it also contains the Redemption code. We recommend using this container if you wish to run Redemption without modifying or extending it.
The build command is:
docker build -f Dockerfile.distrib -t ghcr.io/cmu-sei/redemption-distrib .
And the run command is:
docker run -it --rm ghcr.io/cmu-sei/redemption-distrib bash
Unlike the distrib
container, the prereq
container does not contain the Redemption code; it just contains dependencies necessary to run Redemption. But the Redemption code lives outside the container, on a shared volume. This allows you to modify the Redemption code on the host, while accessing it within the container.
There is a test
Docker image that you can use to extensively test Redemption. It downloads and builds the git
and zeek
OSS projects. Building zeek takes about 40 minutes on one machine. Like the prereq
image, the test
image does not actually contain the Redemption code, so you will need to explicitly share the Redemption volume when you launch the container. To build and run the test
container:
docker build -f Dockerfile.test -t ghcr.io/cmu-sei/redemption-test .
docker run -it --rm -v ${PWD}:/host -w /host ghcr.io/cmu-sei/redemption-test bash
The tool has a simple sanity test that you can run in the distrib
container. It uses pytest
to run all the tests in the /host/code/acr/test
directory of the container (all in functions with names starting with test_
) After launching the distrib
container, the following commands will run a few sanity tests:
pushd /host/code/acr/test
pytest
All tests should pass.
As with any Docker containers, you may share other folders with the Redemption container by mounting other volumes. We recommend mounting a volume that contains the code you wish to repair. For example if your codebase to repair lives in /opt/code
, you can mount it in any container:
docker run -it --rm -v ${PWD}:/host -w /host -v /opt/code:/codebase ghcr.io/cmu-sei/redemption-prereq bash
docker run -it --rm -v /opt/code:/codebase ghcr.io/cmu-sei/redemption-distrib bash
Many codebases are configured to build in a particular location. If your code will not build properly unless it lives in a directory like /opt/codebase
, then you should mount it in the container in the same directory:
docker run -it --rm -v ${PWD}:/host -w /host -v /opt/code:/opt/code ghcr.io/cmu-sei/redemption-prereq bash
docker run -it --rm -v /opt/code:/opt/code ghcr.io/cmu-sei/redemption-distrib bash
In the doc/examples
directory, there are several demos, each living in its own directory. The following table lists each demo; its title is the same as the folder containing the demo. The demos differ in the properties of the code they repair, and this is reflected in the Codebase
column:
Demo Title | Codebase |
---|---|
simple |
Simple C source file |
codebase |
Multi-file OSS codebase |
separate_build |
OSS codebase requiring separate build environment |
If you are new to the Redemption tool, we recommend going through at least one of these demos. If you wish to repair a single C file, you should study the simple
demo. If you wish to repair a multi-file codebase, and you can build the codebase in the Redemption container, you should study the codebase
demo. If your code does not build in the Redemption container, you should study the separate_build
demo.
Inputs to the Redemption tool include:
- Codebase: This is a path to one or more C/C++ source code files. For multiple source files, the
compile_commands.json
file (frombear
, discussed more below) identifies the path for each file, while the-b
or--base-dir
argument tosup.py
(discussed more below) specifies the base directory of the project). - Build command: This can be a compile command or a build system like "make". Redemption needs this in order to learn which macros are defined and what switches are necessary to let Clang parse a source code file). Each alert contains a CERT coding rule or CWE ID, location, message and more - see doc/alert_data_format.md.
- SA tool alerts file: The Redemption tool produces outputs (for each SA alert from input, either a patch to repair the alert OR explanation in text why it cannot be repaired).
The command line tool takes the inputs listed above. The command-line tool's ear
module creates an enhanced abstract syntax tree (AST), its brain
module augments information about each alert and creates a more-enhanced AST and applies alerts and patches, and its glove
module produces repaired source code. Depending on arguments provided to the end-to-end script or superscript (discussed below), some or all of the steps are done and the way that is done can be specified per the arguments.
The base_dir
sets the base directory of the project being analyzed. When repairing a whole codebase, the base_dir
is the directory where the source code to be repaired is located. (When repairing a single file, specifying the base_dir
is not necessary.)
Our tool has three top-level Python scripts:
end_to_end_acr.py
: Runs on a single translation unit (a single.c
or.cpp
file, perhaps also repairing#include
-ed header files)sup.py
: Runs on all the C/C++ source files in a whole codebase (That is, it runs on a whole project, with all translation units specified in acompile_commands.json
file generated by thebear
OSS tool.)make_run_clang.py
: Runs on a single translation unit or all the files in a codebase. Produces a shell script containing Clang commands that Redemption would need to repair it. See the Codebases that cannot be built within the Redemption container section for more information.
This script runs automated code repair or parts of it, depending on the arguments specified.
Since the script and its arguments are subject to change, the best way to identify the current arguments and way to run it involves running the script with no arguments or with --help
as an argument. E.g., in a bash terminal, in the directory code/acr
, run:
./end_to_end_acr.py --help
The STEP_DIR
is a directory the tool uses to put intermediate files of the steps of the process, including output from brain
with enhanced AST and enhanced alerts, plus repairs that may be applied. This information is useful for understanding more about a particular repair, for instance if the repair result is not as you expected. The *.nulldom.json
intermediate files are stored there even if environment variable pytest_keep
is set to false
. Output of other individual modules (ear, brain, etc.) are stored in STEP_DIR
only if environment variable pytest_keep
is set to true
.
This script runs automated code repair or parts of it, depending on the arguments specified.
Since the script and its arguments are subject to change, the best way to identify the current arguments and way to run it involves running the script with no arguments or with --help
as an argument. E.g., in a bash terminal, in the directory code/acr
, run:
./sup.py --help
As specified above, the STEP_DIR
is a directory the tool uses to put intermediate files of the steps of the process, including output from brain
with enhanced AST and enhanced alerts, plus repairs that may be applied. This information is useful for understanding more about a particular repair, for instance if the repair result is not as you expected. The *.nulldom.json
intermediate files are stored there even if environment variable pytest_keep
is set to false
. Output of other individual modules (ear, brain, etc.) are stored in STEP_DIR
only if environment variable pytest_keep
is set to true
.
export acr_default_lang_std=foo # Adds "--std=foo" to the beginning of the arguments given to Clang.
export acr_emit_invocation=true # Show subprogram invocation information
export acr_gzip_ear_out=false # Whether to compress the AST (the output of the ear module); true by default.
export acr_ignore_ast_id=true # Tests pass even if AST IDs are different
export acr_parser_cache=/host/code/acr/test/cache/ # Cache the output of the ear module, else set to ""
export acr_parser_cache_verbose=true # Print messages about the cache, for debugging/troubleshooting
export acr_show_progress=true # Show progress and timing
export acr_skip_dom=false # Skip dominator analysis
export acr_warn_unlocated_alerts=true # Warn when alerts cannot be located in AST
export pytest_keep=true # Keep output of individual modules (ear, brain, etc.). Regardless, the *.nulldom.json intermediate file is kept.
export pytest_no_catch=true # Break into debugger with "-m pdb" instead of catching exception
export REPAIR_MSC12=true # Repair MSC12-C alerts (By default, the system DOES do the repair. The system does not do this repair if this variable is set to `false`)
You must indicate the compile commands that are used to build your project. The bear
command can be used to do this; it takes your build command and builds the project, recording the compile commands in a local compile_commands.json
file.
The following command, when run in the test
container, creates the compile_commands.json
file for git. (Note that this file already exists in the container, running this command would overwrite the file.)
cd /oss/git
bear -- make
The Redemption Tool presumes that you have static-analysis (SA) tool output. It currently supports three SA tools: clang_tidy
or cppcheck
or rosecheckers
. Each SA tool should produce a file with the alerts it generated. If $TOOL
represents your tool, instructions for generating the alerts file live in data/$TOOL/$TOOL.md
. We will assume you have run the tool, and created the alerts file, which we will call alerts.txt
. (The actual file need not be a text file). Finally, when you produced your SA output, the code that you analyzed was in a directory which we will call the $BASE_DIR
.
Next, you must convert the alerts.txt
format into a simple JSON format that the redemption tool understands. The alerts2input.py
file produces suitable JSON files. So you must run this script first; it will create the alerts.json
file with the alerts you will use.
python3 /host/code/analysis/alerts2input.py $BASE_DIR clang_tidy alerts.txt alerts.json
For example, the test
Docker container contains the source code for Git, as well as Cppcheck output. So you can convert Cppcheck's output to alerts using this script:
python3 /host/code/analysis/alerts2input.py /oss/git cppcheck /host/data/cppcheck/git/cppcheck.xml ./alerts.json
Here is an example of how to run a built-in end-to-end automated code repair test, within the container (you can change the out
directory location or directory name, but you must create that directory before running the command):
pushd /host/code/acr
python3 ./end_to_end_acr.py /oss/git/config.c /host/data/compile_commands.git.json \
--alerts /host/data/test/sample.alerts.json --repaired-src test/out \
--base-dir /oss/git --repair-includes true
You can see the repairs made using this command:
diff -u /oss/git/hash.h /host/code/acr/test/out/hash.h
To test a single C file that needs no fancy compile commands, you can use the autogen
keyword instead of a compile_commands.json
file:
pushd /host/code/acr
python3 ./end_to_end_acr.py test/test_errors.c autogen \
--alerts test/test_errors.alerts.json \
--base-dir test --repaired-src test/out
You may need to share a volume, as discussed below.
Our tool requires that you have a single command to build the entire codebase. (There are some exceptions involving single files and autogen
, as mentioned below.) This command could be a call to clang
or some other compiler. It could be a build command like make
or ninja
. It could even be a shell script. To be precise, this can be any command that can be passed to bear
, a tool for generating a compilation database for clang
tooling. For more info on bear
, see: https://github.com/rizsotto/Bear ).
Run bear
on the single-command build system (e.g., run bear
on the makefile). Then, run the superscript code/acr/sup.py
, giving it the compile_commands.json
file created by bear
and specifying code, alerts, etc. as discussed in the above section on sup.py
.
The alerts.json
file is a straightforward JSON file, and one can be created manually. This file consists of a list of alerts, each list element describes one alert. Each alert is a map with the following keys:
tool
: The static-analysis tool reporting the alertfile
: The source file to repair. May be a header (.h
) fileline
: Line number reported by toolcolumn
: (Optional) Column number reported by toolmessage
: String message reported by toolchecker
: The checker (that is, component of the SA tool) that reported the alertrule
: The CERT rule or CWE that the alert indicates (alert about a rule being violated or a weakness instance in the code)
The file, line, and rule fields are the only strictly required fields. However, the column and message fields are helpful if the file and line are not sufficient to identify precisely which code is being reported. Be warned that due to inconsistencies between the way different SA tools report alerts, our tool may misinterpret a manually-generated alert. The best way to ensure your alert is interpreted correctly is to fashion it to be as similar as possible to an alert generated by clang_tidy or cppcheck.
See the Alert Data Format document for more details about fields that may be used when describing alerts.
To enable the Redemption container to run repairs on your local code directories, you should volume-share them with -v
when you launch the container. For example, to volume-share a local directory ./code
:
docker run -it --rm -v ./code:/myCode ghcr.io/cmu-sei/redemption-distrib bash
See https://docs.docker.com/storage/volumes/ for more information on volume-sharing.
See https://docs.docker.com/reference/cli/docker/container/run/ for more information about options using the docker run
command.
Normally, Clang is invoked from within the Redemption tool. For codebases that cannot be built within the Redemption container, we also provide an alternative method: The Redemption tool can be split into three phases: The first phase is managed by the make_run_clang.py
program, which generates a shell script that uses Clang to generate syntactic data about the code. In the second phase, you then run this script on the platform where your code can be built. After running the script, you can run the ACR process in the final phase to use the files generated by Clang to repair the code.
This does presume that your platform that builds the code can run Clang with Redemption's patch. It can also run bear
and that you can produce static-analysis alerts for the code (independently of running Redemption). Solving the problem of building Clang with Redemption's patch on older platforms will be idiosyncratic to each platform. We have provided some Dockerfiles that illustrate how to do this for some platforms; they all live in the dockerfiles
directory:
Dockerfile | Platform |
---|---|
focal.prereq.dockerfile | Ubuntu 20.04 |
See the Separate Build Demo for an example of repairing such a codebase.
This is a short case that demonstrates the command-line arguments used for running Clang separately from the rest of Redemption: The call of make_run_clang.py
produces a shell script that runs Clang to generate the data necessary for Redemption to repair the source code. The call to end_to_end_acr.py
does the actual repair, using the data generated by Clang. In this case ${SOURCE}
is an absolute path which indicates a single C file to repair, and ${ALERTS}
indicates an alerts.json
file containing the alerts to repair; this file should have been created by alerts2input.py
from a static-analysis tool output file.
mkdir -p /host/cache
cd /host/code/acr
./make_run_clang.py -c autogen -s ${SOURCE} \
--output-clang-script /host/cache/run_clang.sh
bash /host/cache/run_clang.sh /host/cache
./end_to_end_acr.py --repaired-src /host/cache/out ${SOURCE} autogen \
--alerts ${ALERTS} --raw-ast-dir /host/cache
-
On the host (i.e., outside the Redemption docker container): Run bear to generate
compile_commands.json
. -
Mount the codebase in the Redemption container so that the base directory in the container is the same as it is in the host. (Making a symbolic link (
ln -s
) instead of directly mounting in the right place probably won't work, becauseos.path.realpath
resolves symlinks.) Then, inside the Redemption container: Again, in this case${SOURCE}
is an absolute path which indicates a single C file to repair, and${ALERTS}
indicates analerts.json
file containing the alerts to repair; this file should have been created byalerts2input.py
from a static-analysis tool output file.
mkdir -p /host/cache
cd /host/code/acr
./make_run_clang.py -c autogen -s ${SOURCE} \
--output-clang-script /host/cache/run_clang.sh
- Copy
/host/cache/run_clang.sh
to the host and run:
ast_out_dir=/tmp/ast_out
mkdir -p $ast_out_dir
bash run_clang.sh $ast_out_dir
cd $ast_out_dir
tar czf raw_ast_files.tar.gz *
- Copy
raw_ast_files.tar.gz
to the Redemption container and then:
cd /host/cache
tar xf raw_ast_files.tar.gz
cd /host/code/acr
./end_to_end_acr.py --repaired-src /host/cache/out ${SOURCE} autogen \
--alerts ${ALERTS} --raw-ast-dir /host/cache
Redemption currently uses a single processor to do its work. However, the sup.py
script supports parallelization. Passing it -j <NUM>
or --threads <NUM>
will cause it to distribute its tasks among <NUM>
processors. If <NUM>
is 0, then it uses all of the processors available on the platform.
See the Codebase Demo Instructions for an example of using parallelization to quickly repair a codebase
Documentation detail about useful environment variables and further testing is in doc/regression_tests.md. Also, a lot more detail about our system that may be of interest to others interested in extending it or just understanding it better is in the doc directory. The data/test directory contains test input and answers used by our regression and sample alert tests.
There is also some documentation out side the doc directory. Documentation about running our static analysis tools all lives inside the data directory, at: Clang-tidy, Cppcheck, and Rosecheckers. Likewise, we have test data from running recurrence tests in data/recurrence...see that directory's README file for details.
When building a Docker container, if you get an error message such as:
error running container: from /usr/bin/crun creating container for [/bin/sh -c apt-get update]: sd-bus call: Transport endpoint is not connected: Transport endpoint is not connected : exit status 1
then try:
unset XDG_RUNTIME_DIR
unset DBUS_SESSION_BUS_ADDRESS
For more information, see containers/buildah#3887 (comment)
Contributions to the Redemption codebase are welcome! If you have code repairs or other contributions, we welcome that - you could submit a pull request via GitHub, or contact us if you'd prefer a different way.