Theoretical TFLOPS ≠ Real-world Performance

Testing Theoretical Maximum FLOPS on GPUs

This project aims to measure the theoretical maximum FLOPS (Floating Point Operations Per Second) achievable on various GPU models. Please see the original work by Stas Bekman.

Key Features

Optimized Search: Unlike the original implementation which uses a brute force approach, this version leverages Optuna for efficient parameter optimization.
Visualization: Optuna provides insightful visualizations of the optimization process:
Data Collection: An optional feature allows submitting results to a remote API for data collection and analysis.

Stats

GPU Model	Best Shape (MxNxK)	TFLOPS
NVIDIA RTX 4000 SFF Ada Generation	2304x5120x1536	59.0
NVIDIA A10G	20480x18112x19712	69.7
NVIDIA GeForce RTX 3090	5248x15040x1024	78.0
NVIDIA RTX 4000 Ada Generation	14464x5312x20480	82.7
NVIDIA GeForce RTX 3090 Ti	10752x15488x10752	86.0
NVIDIA L4	1024x6016x1792	91.4
NVIDIA RTX A5000	17856x17024x3584	93.9
Tesla V100-SXM2-32GB	17216x20480x4096	94.0
Tesla V100-SXM2-16GB	2048x17920x1216	96.1
Radeon RX 7900 XTX	11008x3392x9216	113.3
DCU K100_AI	9344x3968x6592	126.3
NVIDIA RTX A6000	9856x12480x13248	131.2
AMD Instinct MI210	17536x7360x2304	142.8
NVIDIA L40	3712x2624x11136	170.3
NVIDIA GeForce RTX 4090	14336x4096x4096	178.8
NVIDIA L40S	4416x3776x3072	252.0
NVIDIA RTX 6000 Ada Generation	2624x5632x3328	278.5
NVIDIA A100 PCIe	2304x5120x1536	256.4
NVIDIA A100 SXM	6912x16384x2048	267.9
NVIDIA H100 NVL*	2560x2176x8192	488.5
NVIDIA H100 PCIe	6912x16384x2048	499.5
AMD Instinct MI300X	4096x8448x4864	788.2
NVIDIA H100 SXM 96GB	16896x15680x1024	807.1
NVIDIA H100 SXM 80GB	6144x17920x2816	821.2
NVIDIA GH200 96GB	7616x17664x4480	852.5
NVIDIA GH200 144G HBM3e	7616x17664x4480	853.8

*for H100 NVL we are only using a single card as we don't support multi-gpu

Install

# For a faster and smoother installation experience, we recommend using `uv`, an extremely fast Python package installer written in Rust.
# It's a seamless drop-in replacement for pip, so you don't have to worry about compatibility.
# You can easily install it with: 
pip install uv
git clone https://github.com/mag-/gpu_benchmark
cd gpu_benchmark
uv venv
source .venv/bin/activate
uv pip install -r requirements.txt
./mamf-finder.py

TODO:

Change benchmarking logic, see discussion here: [https://github.com/mag-/gpu_benchmark/discussions/1]
check raw CUDA
check tinygrad

Acknowledgements:

Thanks to Bernhard from GPTshop.ai for giving me access to GH200

Special thanks to Stas Bekman for the original implementation and research.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
img		img
.gitignore		.gitignore
README.md		README.md
mamf-finder.py		mamf-finder.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Theoretical TFLOPS ≠ Real-world Performance

Testing Theoretical Maximum FLOPS on GPUs

Key Features

Stats

Install

TODO:

Acknowledgements:

About

Releases

Packages

Languages

mag-/gpu_benchmark

Folders and files

Latest commit

History

Repository files navigation

Theoretical TFLOPS ≠ Real-world Performance

Testing Theoretical Maximum FLOPS on GPUs

Key Features

Stats

Install

TODO:

Acknowledgements:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages