Replies: 14 comments 1 reply
-
Summarize what you did.I chose to work in Explain how you know your implementation works—how did you test it? Which test inputs did you use? Do you have any quantitative results to report?Turnt
Brench
As suspected, LICM does optimize programs in SSA-form as well as in their roundtrip forms.
One interesting observation is The overhead of execution of SSA-form bril overweighed the original program execution as shown in both What was the hardest part of the task? How did you solve this problem?
|
Beta Was this translation helpful? Give feedback.
-
This repo contains the code for loop-invariant code motion. The experiments cover the benchmarks from the I did not cover the I have reported the statistics of the relative and absolute improvements (over the benchmarks) for the two experiments, where relative is equal to An interesting point was seeing a degradation of performance for one of the The challenges that I faced were mainly oriented around solving stupid bugs, and the algorithm itself was fairly smooth. The only ambiguity was in identifying the body of a natural loop, for which I used some notes from the Cornell 4120 edition of compilers. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
For this assignment, we (Stephen and Benny) implemented induction variable elimination in llvm ImplementationThere were certain parts of the assignment that were much easier using llvm and certain parts that were a lot more challenging. Those are covered in the challenges section. for (int i = 0; i < LEN2; i++) {
for (int j = 1; j < LEN2; j++) {
printf("%d %d\n", j - 1, k - 1);
++k;
}
++k;
} We also had to consider cases of nested loops, particularly cases where doing strength reduction in an inner loop creates more opportunities for strength reduction in the outer loop. Using LLVM (and being in SSA form) made it extremely simple to check that operands were loop invariant and to easily move instructions around. Probably the most useful aspect of working in LLVM was the TestingWe ran our implementation against a number of LLVM TSVC tests/benchmarks. We also handwrote about 30 or so tests to hit corner cases. We grabbed our performance evaluation benchmarks from the LLVM TSVC Benchmarks and the LLVM Single Source Benchmarks. From these, we chose Linpack, the Induction Variable Test Suite, and the Restructured loops benchmarks. The following metrics were collected from running two trials, each trial performing 20,000 repetitions for each benchmark except for Linpack, which performed 400. We used an Ubuntu 22.04.03 LTS machine (Linux Kernel Version 6.2.0-33) with an Intel i7-13620H and 16 Gb memory. For the baseline, we performed (in this exact order) mem2reg, sroa, early-cse, loop-simplify, instcombine, and adce passes. For the optimized version, we performed this sequence, plus our pass, followed by Overall we found our optimization... had no significant impact. We measured an average speedup of 1.0053x +/- 0.0121 across all benchmarks with a standard deviation of 0.05008 and median of 1.0017x. ChallengesThe most significant challenge was simply getting our pass in a state where we could run it. If you run it at the beginning of the opt pipeline then mem2reg hadn’t ran yet and everything would be memory accesses that we couldn’t optimize. If we ran it at the beginning of the optimization pipeline, then LLVM’s internal induction variable elimination would have already run. Instead, we needed to have clang generate LLVM, and then optimize this manually with
Another challenge is that one of the best opportunities to optimize for induction variables is array accesses, but LLVM uses its Another challenge is we realized that when we can replace conditions with derived variables is not exactly straightforward. We realized it only works when the factor is positive. I had written a pretty robust interval analysis from the last assignment, however, we ended up just using LLVM's LazyValueInfo (which isn't as powerful) because some of our test cases revealed some non-termination behavior in the interval analysis which I admittedly didn't feel like fixing right now. |
Beta Was this translation helpful? Give feedback.
-
SummaryDetailsI implemented loop-invariant code motion for standard (non-SSA) Bril. I followed the steps outlined in lecture: detect natural loops, insert preheaders, detect loop-invariant instructions, and move instructions to the preheader when it can be done safely. I used the weaker version of the requirement that the to-be-moved instruction must dominate all loop exits, which performed significantly better than the stronger requirement. TestingTo test correctness and performance impact, I used
Across all the benchmarks, 26 saw some speedup, 41 saw no change, and 2 saw a slowdown. Of the two slowdowns, one was caused by additional jumps added when inserting the preheader, and the other I'm not sure about. DifficultiesUsing Bril made this relatively straightforward, and I didn't run into any particularly tricky edge cases this time around (a pleasant surprise). That said, there are still a lot of things to check in order to safely move instructions around, so there were still bugs to work out. |
Beta Was this translation helpful? Give feedback.
-
I worked with @obhalerao on this project. SummaryImplementation DetailsWe implemented an LLVM pass to perform loop-invariant code motion. Since I've implemented this particular optimization on a different IR in CS 4120, we decided to take advantage of LLVM's TestingWe first ran our code on
In order to force the compiler to use a multiplication instruction, we used DifficultiesThe main difficulty we encountered was running the LLVM pass. |
Beta Was this translation helpful? Give feedback.
-
SummaryLICM implementation under lesson8 folder in Skeleton.c We uses a ModuleAnalysisManager for our pass. Then we iterate through the functions inside the module and obtain the natural loops for each function. While this is really easy to use this has posed certain constraint to our optimizer: during testing we find out that our algorithm has problem doing optimization if a function A is called within a loop in function B Testwe have 3 simple tests to test the capability (loop.c, foo.c, and test.c) In loop.c our function successfully identifies the only loop invariant (we assign variable to constant in a loop repetitively). In test.c we tests whether our algorithm could identify common nonvariant instructions like an instruction doesn't dominate loop exit or load from memory. In foo.c exposes the limitation of only obtaining loop information inside function call Hardest Partwe encountered a lot of problems due to the recent upgrade of the llvm library, making a lot of the manager classes legacy and hard to use. Meanwhile, we also face problem using the latest library functions. Also many documentations are for the legacy library functions so pretty hard to learn. we first tried to use the latest loopmanager to obtain the natural loops, but didn't succeed, so we have to take a step back and use the legacy one, which works in a constraint way. Handle Load And Store InstructionWe found that load and store instructions can't be simply handled by the algorithm given in the course notes. However, llvm heavily relies on load and store. As a result, we extended the algorithm to handle load and store cases. We copied the pseudocode from the course notes, and prefix our changes with ">", our code also has comments explain the algo.
|
Beta Was this translation helpful? Give feedback.
-
Summary@rcplane, @Enochen, and @zachary-kent collaborated on this project. We implemented loop invariant code motion as an LLVM pass. Implementation
Testing
Difficulties
|
Beta Was this translation helpful? Give feedback.
-
Will and I worked on lesson 8 together. Summary
Implementation details
What was the hardest part of the task? How did you solve this problem?
|
Beta Was this translation helpful? Give feedback.
-
Summary
Details
Testing/EvaluationTo convince myself that the pass was indeed hoisting the loop invariant code out of the natural loop I wrote a small test example and ran it through the pass. The test example and the corresponding LLVM is shown below. The loop invariant code is #include <stdio.h>
#define N 3
int main() {
int x = 0;
for (int i = 0; i < N; i++) {
int j = 2; // loop invariant
x = x + j;
}
printf("x = %d\n", x);
return 0;
} define i32 @main() #0 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
%3 = alloca i32, align 4
%4 = alloca i32, align 4
store i32 0, ptr %1, align 4
store i32 0, ptr %2, align 4
store i32 0, ptr %3, align 4
br label %5
5: ; preds = %12, %0
%6 = load i32, ptr %3, align 4
%7 = icmp slt i32 %6, 3
br i1 %7, label %8, label %15
8: ; preds = %5 (loop body)
store i32 2, ptr %4, align 4 ; loop invariant (j = 2)
%9 = load i32, ptr %2, align 4
%10 = load i32, ptr %4, align 4
%11 = add nsw i32 %9, %10
store i32 %11, ptr %2, align 4
br label %12
12: ; preds = %8
%13 = load i32, ptr %3, align 4
%14 = add nsw i32 %13, 1
store i32 %14, ptr %3, align 4
br label %5, !llvm.loop !5
15: ; preds = %5
%16 = load i32, ptr %2, align 4
%17 = call i32 (ptr, ...) @printf(ptr noundef @.str, i32 noundef %16)
ret i32 0
} I opted to test on the TSVC benchmark suite to measure the performance improvement of my optimizations. I had a lot of trouble getting this benchmark suite to run, but after reading Stephen, Sanjit and Matt's post above, I was able to get something running. However, I was only able to measure the wallclock time of the pre and post optimized programs. I was not able to measure the number of instructions executed. Factoring into the slight variance with startup, I found that the speed was the same, this was a bit disheartening, perhaps with more time and after I have recovered I will look into this more. Difficulties
|
Beta Was this translation helpful? Give feedback.
-
Summary @20ashah , @AliceSzzze , and I implemented a loop-invariant code motion pass. This pass hoists loop-invariant instructions (which are determined to be safe to remove from the loop) to the loop preheader. Implementation Details Our pass maintains an unordered set called loopInvariants, which stores LLVM values that are loop-invariant. For each instruction, I, we check to see if the instruction can be hoisted (moved outside of a loop) safely. It checks the following conditions: It uses the isSafeToSpeculativelyExecute function to ensure that it's safe to speculatively execute the instruction, e.g. not a call / no side effects. The run function is the main function of our pass. The run function iterates through the blocks of a loop and its instructions. For each instruction, it checks if it's loop-invariant and safe to hoist. If it is, the instruction is moved before the loop's preheader (i.e., outside the loop). The loop is processed repeatedly until no more hoisting can be done (controlled by the changing flag). Evaluation We ran our optimization on a few small programs with loops and verified that the loop-invariant instructions are hoisted out of the loop and the optimized program works as before. For example, in the following program the instruction %5 is hoisted out of the loop define void @squareit(i32 noundef %0) #0 { define i32 @main() #0 { 1: ; preds = %7, %0 4: ; preds = %1 7: ; preds = %4 9: ; preds = %1 } ; Function Attrs: noinline nounwind ssp uwtable 2: ; preds = %7, %0 5: ; preds = %2 7: ; preds = %5 9: ; preds = %2 We tried and are still trying to run the llvm test suite, but ran into a lot of build issues and none of us really knows how cmake works :( Anything Hard or Interesting? Getting acclimated to the LLVM syntax in general was challenging. As another post mentioned, much of the documentation online was based on the legacy Pass Manager so getting the new version to function properly was a challenge. Handling side-effects was another obstacle we ran into. There were many loads and stores in the unoptimized LLVM IR. Since these have side-effects, they cannot be hoisted from the loop. To resolve this issue, we ran a mem2reg pass prior to the LICM pass which allowed more instructions to be designated as loop-invariant. |
Beta Was this translation helpful? Give feedback.
-
Summary
Implementation
Testing
Evaluation
I suspect the reason is, in a normal program, the hot spot code is usually loop-variant. If not, why do we need a loop there? I inspected into the optimized IR of one of the test programs,
ChallengesThe difficult part is to navigate the LLVM documents and fild the correct utilities functions to use. It require some trial-and-error, but once you find the correct ones, the pass will become fairly robust. However, I avoided using the |
Beta Was this translation helpful? Give feedback.
-
@xalbt and I worked together on lesson 8. SummaryImplementation
Difficulties
Testing
|
Beta Was this translation helpful? Give feedback.
-
I worked with @emwangs and @he-andy for this assignment. Summary
ImplementationWe attempted to implement this assignment in LLVM. We ran into many issues. First, getting LLVM to build on a loop pass was pretty difficult. I think that loop passes are either deprecated or there isn't a ton of support for them, so we had to work a bit hard to get anything to run. Once we got it building, the code wouldn't run for a bit because LLVM could not find our pass. This was also weird, because we registered it just like we registered any other pass. Eventually we got this to work by forcing a directive into our run script. We also made a run script Then we tried to actually implement LICM. I believe it works for super trivial test cases, but the analysis is far too conservative. Upon further inspection, it appears that the function So overall, we're still working on this one. I believe that to do this properly we might want to find the loops ourselves that way we can also identify the pre-headers and modify instructions as needed, but this will take a lot more time, so I plan to keep working on this over the weekend. DifficultiesEverything I mentioned above! (Building, registering the pass, running the pass, testing, etc). TestingWe have not been able to fully test this one yet, as we were quite busy this week. We know that it doesn't work very well though, and will keep working as the weekend goes on :) |
Beta Was this translation helpful? Give feedback.
-
Which loop optimization did you pick?
Beta Was this translation helpful? Give feedback.
All reactions