Skip to content

Example uses

fabianbs96 edited this page Aug 26, 2019 · 8 revisions

Running analysis and interpreting the result

After deciding about the type of analysis, we run it on some code. The code, on which the analysis runs, is LLVM IR code instead of original code.

Running analysis on a simple code

We start with analyzing an easy code example. First let us consider some short C/C++ code, saved in a file called main.cpp and mentioned bellow, that a user would like to analyze:

int main() {
    int i = 1;
    int j = 2;
    int k = i + j;
    return 0;
}

Since all analysis solvers are done on the LLVM IR code, in the following the IR representation of the above C++ code is mentioned. (The LLVM Language Reference Manual can be found here). It makes much sense to make oneself familiar with the LLVM intermediate representation that all infrastructure is based on!

Using bellow command, we call the clang compiler and ask it to emit the LLVM IR code, gives us the corresponding LLVM IR code of main.cpp.

$ clang++ -emit-llvm -S main.cpp

After running this command a file named main.ll can be found within the current directory. The main.ll should contain code similar to:

; ModuleID = 'main.cpp'
source_filename = "main.cpp"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: noinline norecurse nounwind optnone uwtable
define i32 @main() #0 {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  %3 = alloca i32, align 4
  %4 = alloca i32, align 4
  store i32 0, i32* %1, align 4
  store i32 1, i32* %2, align 4
  store i32 2, i32* %3, align 4
  %5 = load i32, i32* %2, align 4
  %6 = load i32, i32* %3, align 4
  %7 = add nsw i32 %5, %6
  store i32 %7, i32* %4, align 4
  ret i32 0
}

attributes #0 = { noinline norecurse nounwind optnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 5.0.1 (tags/RELEASE_501/final 332326)"}

The file to be analyzed by our framework can be specified using the -m flag. PhASAR starts the analysis at the very first instruction of the main() function by default.

An example call to an analysis is:

$ phasar -m path/to/your/main.ll -D IFDS_SolverTest

The above command runs an IFDS solver test on the IR code contained in main.ll.

The LLVM infrastructure supports multiple passes that can be run on the intermediate representation in order to optimize or simplify it. "The compiler front-end typically generates very stupid code" that becomes high quality code when applying various compiler passes to it. One very important pass that might be used is the so-called memory to register pass (mem2reg). LLVM follows a register based design (unlike stack-based Java byte code). Conceptually LLVM assumes that it has an infinite amount of registers that it can use. (It is up to the code generator, to produce code that uses the amount of registers available for the target platform.) As you can see in the above code example, we use i,j and k stack variables. The variables translate into the LLVM IR as the result of alloca instructions, which allocate the desired amount of stack memory. Then, the mem2reg pass make use of the infinite amount of registers in order to eliminate as many 'memory cells' as possible and places values into registers instead. Due to complex pointer arithmetic, LLVM in most of the time is not able to eliminate all alloca instructions. Reducing the amount of memory cells makes analysis-writing much more easy. But it is important that one understands the conceptual step that mem2reg performs. PhASAR runs the mem2reg pass automatically as default behavior on the code under analysis. For beginners and debug reasons, it is recommended to change this default behavior by using the --mem2reg option.

TODO!!! Let us have a look on the result of applying the mem2reg pass on the above IR code. We run the pass by using the opt tool provided by the compiler tool chain.

$ opt -mem2reg -S main.ll

The output of this command should look similar to:

; ModuleID = 'main-mem2reg.ll'
source_filename = "main.cpp"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: noinline norecurse nounwind uwtable
define dso_local i32 @main() #0 {
  %1 = add nsw i32 1, 2
  ret i32 0
}

attributes #0 = { noinline norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 8.0.0 "}

As expected LLVM is able to place all values into registers and completely replace all of the alloca instructions. PhASAR behavior is the same either you provide LLVM IR or optimized IR code, while using it with the -m command-line option. But of course it makes sense to compile your source code into LLVM IR using the production flags in order to ensure that PhASAR analyzes the code "as your machine sees it".

Running analysis on a more complex code

Let us consider this slightly more complex C++ program (one that involves a function call).

int function(int x) {
	return x + 1;
}

int main() {
	int i = 42;
	int j = function(i);
	return 0;
}

The above program translates into the following IR code:

; ModuleID = 'main.cpp'
source_filename = "main.cpp"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: noinline nounwind optnone uwtable
define i32 @_Z8functioni(i32) #0 {
  %2 = alloca i32, align 4
  store i32 %0, i32* %2, align 4
  %3 = load i32, i32* %2, align 4
  %4 = add nsw i32 %3, 1
  ret i32 %4
}

; Function Attrs: noinline norecurse nounwind optnone uwtable
define i32 @main() #1 {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  %3 = alloca i32, align 4
  store i32 0, i32* %1, align 4
  store i32 42, i32* %2, align 4
  %4 = load i32, i32* %2, align 4
  %5 = call i32 @_Z8functioni(i32 %4)
  store i32 %5, i32* %3, align 4
  ret i32 0
}

attributes #0 = { noinline nounwind optnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { noinline norecurse nounwind optnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 5.0.1 (tags/RELEASE_501/final 332326)"}

Running the IFDS_SolverTest analysis on the non-mem2reg transformed code produces the following IFDS/IDE results (which are quite different from the intra/inter monotone framework results that are completely self-explaining. For that reason, we omit their explanation here.):

### DUMP LLVMIFDSSolver results
--- IFDS START RESULT RECORD ---
N: store i32 %0, i32* %2, align 4, !phasar.instruction.id !3, ID: 1 in function: _Z8functioni
D:	@zero_value = constant i2 0, align 4, ID: -1 	V:  BOTTOM
--- IFDS START RESULT RECORD ---
N: %2 = alloca i32, align 4, !phasar.instruction.id !2, ID: 0 in function: _Z8functioni
D:	@zero_value = constant i2 0, align 4, ID: -1 	V:  BOTTOM
--- IFDS START RESULT RECORD ---
N: %3 = load i32, i32* %2, align 4, !phasar.instruction.id !4, ID: 2 in function: _Z8functioni
D:	@zero_value = constant i2 0, align 4, ID: -1 	V:  BOTTOM
--- IFDS START RESULT RECORD ---
N: ret i32 %4, !phasar.instruction.id !6, ID: 4 in function: _Z8functioni
D:	@zero_value = constant i2 0, align 4, ID: -1 	V:  BOTTOM
--- IFDS START RESULT RECORD ---
N: %3 = alloca i32, align 4, !phasar.instruction.id !4, ID: 7 in function: main
D:	@zero_value = constant i2 0, align 4, ID: -1 	V:  BOTTOM
--- IFDS START RESULT RECORD ---
N: store i32 0, i32* %1, align 4, !phasar.instruction.id !5, ID: 8 in function: main
D:	@zero_value = constant i2 0, align 4, ID: -1 	V:  BOTTOM
--- IFDS START RESULT RECORD ---
N: store i32 42, i32* %2, align 4, !phasar.instruction.id !6, ID: 9 in function: main
D:	@zero_value = constant i2 0, align 4, ID: -1 	V:  BOTTOM
--- IFDS START RESULT RECORD ---
N: %4 = load i32, i32* %2, align 4, !phasar.instruction.id !7, ID: 10 in function: main
D:	@zero_value = constant i2 0, align 4, ID: -1 	V:  BOTTOM
--- IFDS START RESULT RECORD ---
N: %5 = call i32 @_Z8functioni(i32 %4), !phasar.instruction.id !8, ID: 11 in function: main
D:	@zero_value = constant i2 0, align 4, ID: -1 	V:  BOTTOM
--- IFDS START RESULT RECORD ---
N: store i32 %5, i32* %3, align 4, !phasar.instruction.id !9, ID: 12 in function: main
D:	@zero_value = constant i2 0, align 4, ID: -1 	V:  BOTTOM
--- IFDS START RESULT RECORD ---
N: ret i32 0, !phasar.instruction.id !10, ID: 13 in function: main
D:	@zero_value = constant i2 0, align 4, ID: -1 	V:  BOTTOM
--- IFDS START RESULT RECORD ---
N: %4 = add nsw i32 %3, 1, !phasar.instruction.id !5, ID: 3 in function: _Z8functioni
D:	@zero_value = constant i2 0, align 4, ID: -1 	V:  BOTTOM
--- IFDS START RESULT RECORD ---
N: %1 = alloca i32, align 4, !phasar.instruction.id !2, ID: 5 in function: main
D:	@zero_value = constant i2 0, align 4, ID: -1 	V:  BOTTOM
--- IFDS START RESULT RECORD ---
N: %2 = alloca i32, align 4, !phasar.instruction.id !3, ID: 6 in function: main
D:	@zero_value = constant i2 0, align 4, ID: -1 	V:  BOTTOM

In IFDS/IDE results for each program statement N, all data-flow facts D holding at that program point are shown. Additionally the value from the value domain V is printed. Note: when running IFDS analysis, only BOTTOM is shown, since TOP is representing data-flow facts that do not hold and thus are irrelevant to the analysis user.

Additionally to the results, PhASAR is able to record all edges from the exploded super-graph that the computation is based on. The edges reside in two edge recorders (for intra- and inter-procedural edges) inside the IDESolver implementation.

Here is a visualization of an exploded super-graph of a different analysis to give you an impression of how it looks like.

alt text

More examples can be found here.

Clone this wiki locally