git clone https://github.com/doug65536/membw.git
cd membw
make
make run
sudo apt install mingw-w64-common mingw-w64-tools binutils-mingw-w64 g++-mingw-w64-x86-64
sudo apt install gcc-aarch64-linux-gnu \
g++-aarch64-linux-gnu binutils-aarch64-linux-gnu
make build-windows
make build-arm
make build-linux
make build-windows
make VECFLAGS=-mno-sse -B run
Example
You can build for linux, windows, x86_64 or arm (NEON). It probably builds on almost any architecture, but it will use horrible scalar instructions for measurement. It really does make a big difference.
This code allows the CPU to fully take advantage of the prefetcher, the memory access pattern is very predictable and the memory controller has a lot of information to work with.