Taint analysis improvements #123

fxshlein · 2023-12-07T00:03:16Z

Hi! I've been implementing some static analysis (somewhat similar to taint analysis, but the propagated state is a bit more complex). During this, I've implemented some improvements to the various components that are involved, which vastly improved results for me, and I just wanted to share them in case you'd be interested in these changes, perhaps they are relevant 🙂

I wrote them in kotlin in our internal code, but I'd be happy to make a PR if the changes are something that would be useful to you.

Note that some of these changes are based on the assumption that the heap is not tracked.

1. Replacing virtual calls with calls to implementations

This greatly improved the results because I have a lot of cases like this:

class Taint {
    static String source() { ... }

    static void sink(String value) { ... }
}

interface Interface {
    void takeTheArgument(String argument)
}

class Implementation implements Interface {
    @Override
    void takeTheArgument(String argument) {
        Taint.sink(argument)
    }
}

class Main {
    // And somewhere:
    static main(Interface value) {
        value.takeTheArgument(Taint.source())
    }
}

The main method calls takeTheArgument on an Interface with a tainted argument, but the call is ignored, since the call is to an interface, and not the implementation.

What I did instead is that, when building the CFA, I take every invokevirtual, also add calls to the implementations of the method for each subclass that implements it. In the resulting CFA, the single invokevirtual then has multiple outgoing calls instead of one, which seems to be handled fine by the existing algorithms, and taint analysis will also look at the implementations of the methods.

2. Starting from taint sources instead of the main entrypoint

Instead of directly using a BamCpaRun, I implemented a top level CpaRun based on a custom transfer relation, which will:

If the current position is an exit node, backtrack to the method entry, use the reduce operator to create the return state, and then uses all the callers as the successors.
If the current position is a call, it uses a BamCpaRun to analyze that call. The BamCpaRun will have the called method as the main method.
Otherwise, it just uses the regular JvmModelTrackingTransferRelation.

As this sometimes starts analyzing from the middle of a method, the stack is modified to just return an empty state instead of throwing when it's empty and someone tries to pop an item from it.

When, after backtracking, the return value of a function is not tained, the algorithm stops, since the only thing that could cause the state to be tainted again is another call to a taint source, and that is analyzed separately.

This allows me to drastically reduce the amount of code that is analyzed, since, instead of starting from a main method, I can start analysis from inside my source method. It also circumvented some issues I had, where some pieces of code were not reached from the main method. The program I'm analyzing is fairly huge, so it's kind of expected that coverage would not be perfect.

3. Nested Call Filter

This one was again really useful to optimize the amount of code analyzed. Basically it is a simple predicate, which allows the implementation to decide whether to actually analyze a function call. The predicate is called here in this if statement in BamTransferRelation. If the method should not be entered, it is treated like an unknown method.

Given that:

The run is directly starting from the sources as entry points (see above)
The heap is not being tracked (i.e. a forgetful heap model is being used)

A call where none of the operands is tainted can be filtered out, as there is no way for any code in that call to be tained. If that code calls a taint source again, it will also be an entry point, and that will be analyzed separately.

4. Taint analysis from field access

This one is fairly straightforward and probably an intended functionality, but it seems that the taint analysis algorithm doesn't implement it currently:
By overriding the JvmForgetfulHeapAbstractState, and returning a tainted state from getFieldOrDefault depending on the passed fqn, it's possible to use a field as a taint source, even without requiring one of the more complex heap models.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Taint analysis improvements #123

Taint analysis improvements #123

fxshlein commented Dec 7, 2023 •

edited

Loading

Taint analysis improvements #123

Taint analysis improvements #123

Comments

fxshlein commented Dec 7, 2023 • edited Loading

1. Replacing virtual calls with calls to implementations

2. Starting from taint sources instead of the main entrypoint

3. Nested Call Filter

4. Taint analysis from field access

fxshlein commented Dec 7, 2023 •

edited

Loading