Profile your PyTorch model with model-level, layer-level, and operator-level metrics.
In deployment, identifying the bottleneck of our model is crucial. Typically, we analyze the cost from the model level down to the operator level. In this tutorial, we will show you a step-by-step guide to profile your PyTorch models.
.
├── README.md # main documentation
├── requirements.txt # dependencies
├── assets # temp files (images, logs, etc)
├── quickstart.ipynb # custom model profiling
├── resnet.ipynb # resnet50 profiling (TBD)
└── vit.ipynb # vision transformer profiling (TBD)
image source: Basics of Neural Networks (MIT 6.5940, Fall 2023)
- Memory-Related
- #Parameters: the parameter count of the given neural network.
- Model Size: the storage for the weights of the given neural networks
- Peak #Activations: the intermediate outputs
- Computation-Related
- MAC: multiply-accumulate operations
- FLOP, FLOPS: floating-point operations, floating-point operations per second
- Latency: the delay from the input to the output
- Throughput: the number of data processed per unit of time
If you do not familiar with common profiling tools, please refer to the following tutorials:
- pytorch-benchmark - model-level
- Flops Profiler - layer-level
- pytorch_memlab - layer-level
- torch.fx - layer-level
- PyTorch Profiler - operator-level
conda create -n pytorch_profiler python=3.9 -y
conda activate pytorch_profiler
pip install -r requirements.txt
Go through quickstart notebook to learn profiling a custom model.
# custom model
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.conv1 = nn.Conv2d(3, 3, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(3, 3, kernel_size=3, padding=1)
self.conv3 = nn.Conv2d(3, 3, kernel_size=3, padding=1)
self.conv4 = nn.Conv2d(3, 3, kernel_size=3, padding=1)
def forward(self, x1):
x1 = self.conv1(x1)
x1 = self.conv2(x1)
x1 = self.conv3(x1)
x1 = self.conv4(x1)
return x1
Model-Level | Layer-Level | Operator-Level |
---|---|---|
Go through resnet notebook and vit notebook to check profiling results of ResNet50 and Vision Transformer.
Modify quickstart notebook to profile your own model.
If you want to get a line-by-line analysis of non-pytorch python scripts, please refer to line_profiler and memory-profiler. Basic usage is as follows:
"""
test_profile.py
"""
import math
from line_profiler import profile
# from memory_profiler import profile
@profile
def test_profile(x: int) -> int:
"""test function for profile"""
sum = 0
for i in range(x):
sum += 1
return sum
def main():
test_profile(1000)
Latency | Memory |
---|---|