Category | Difficulty |
---|---|
Labs | 6 |
Midterm | 6 |
This is a class that takes pieces of 18-240 and 18-213 and extends them, giving a higher-level view of computer architecture as a whole. It is a great option to take before 447 since there are many topics (CPU pipeline, branch prediction, memory hierarchy, parallelism) which 344 covers from a "why are they interesting and how do they affect software" perspective, and then 447 subsequently goes into depth on how they are actually implemented.
The course starts where the last part of 240 leaves off—building up from a single-cycle CPU into a 5-stage pipelined RISC-V CPU with a memory hierarchy and a branch predictor. Unlike 240/341/447 however, you won't be writing any SystemVerilog code in this course—all the labs are done by writing code to simulate and analyze different components of a computer architecture. The later parts of this course focus more on the software side—all the topics which 213 can't quite go into depth on: more advanced caches, virtual memory, different techniques for parallelism, and how to optimize different types of code.
There are 5 labs during the semester (plus lab 0 which is just infrastructure setup) and they are completed in pairs. Unlike almost any other course you would have taken before this (like 213 and 240) the labs have a very "scientific" focus to them. Each lab has some kind of implementation component, but the majority of the lab consists of running experiments with your implementation, collecting data, and then analyzing that data. In the lab writeup, you have to show graphs and statistical analyses justifying your results and drawing some conclusions. One slightly tricky part is that there are no provided test cases (unlike in 213) so you have to test your code yourself on the input data. If you are interested in systems or CompArch research, you will likely find this part of the class very useful since each project report feels like a mini-research paper. The implementation portion for all the labs is done in C/C++, and you will likely end up using Python or some other scripting language to automate running benchmarks and analyzing results.
As of fall 2021 the five labs are:
-
The first lab involves implementing a few different types of branch-predictors and then comparing their prediction accuracy on a variety of workloads (the SPEC2017 benchmark suite). This lab is relatively straightforward in terms of implementation, but provides a good baseline for the kind of work the rest of the labs will consist of.
-
This lab is about comparing different cache hierarchies—you implement a 2-level cache hierarchy and then explore the design-space for cache size vs power/area. This lab introduces the idea of Pareto frontiers and sensitivity studies, which are quite important in any kind of research. This is probably the most time-consuming lab in the course, both because the implementation (especially the automation/scripting to collect large amounts of experiments) is difficult, and the experiments also take many, many hours to run.
-
The third lab allows you to implement a simulated virtual-memory and TLB system. This is not a particularly interesting lab from a scientific perspective—it is mostly implementation focused with just a few sensitivity studies involving TLB organization and how they affect cache access time.
-
This lab is a bit different from the first three—instead of implementing a simulated component of a CPU, you are provided with an implementation of a graph-manipulation algorithm and implement optimizations (primarily the Propagation Blocking method) to make it more cache-friendly, and then explore the trade-offs of a few parameters in the algorithm which you can tune (how they affect performance on different workloads).
-
In this lab, you implement several different multi-core synchronization mechanisms and then test them on a few workloads, doing a study of how different mechanisms perform as the workload size, thread count, and contention level (how often two threads try to access the same data) changes. This is another lab which you will want to start early, since the experiments primarily involve measuring runtime of the different workloads, and other heavy users of the same machines may interfere with your measurements.
There is one midterm exam, which primarily covers the CPU pipeline and performance-analysis topics (including topics like superscalar, out-of-order, and VLIW machines and how their performance differs) which are not really explored by any of the labs. There is no final exam.
- Start the labs early, experiments can take a long time to run
- Figure out how to automate experiment-running as early as possible (labs 2, 4, and 5 involve running a huge number of experiments and it will be very useful to automate)
- Make sure you understand the lecture on performance comparisons, it will be super crucial for the labs
- For the assignments with simpler implementation portions, it may be worthwhile for both partners to implement it separately and then ensure the results are the same (this makes it less likely that an implementation bug will sneak through)
- If you are interested in CPU design and intend to take 18-447 after this class
- If you have already taken 447, the first 1/3 of the course will seem a bit redundant
- If you plan to go into computer architecture research and want hands-on experience with the scientific aspect
- If you enjoyed 213 and want to learn more about caches, memory, synchronization, and such