This tutorial is about an introduction to GPU and profiling of Deep Learning models using PyTorch Profiler. Further details are provided in the attached slides (see presentation_HA.pdf).
The code application is adapted from the PyTorch tutorial.
The application is stored in the folder /examples. Here the python application to be profiled is "resnet18_api.py", which is specified for 4 batches and which can be adpated to a large number of batches.
Here we describe how to set up PyTorch using a singularity container.
- Step 0: Pull a PyTorch container image e.g. from NVIDIA NGC container (Note that the host system must have the CUDA driver installed and the container must have CUDA)
singularity pull docker://nvcr.io/nvidia/pytorch:22.12-py3
- Step 1: Launch singularity container
singularity exec --nv -B ${MyEx} pytorch_22.12-py3.sif python ${MyEx}/resnet18_api.py
Here the container is mounted to the path ${MyEx}
, where the python application is located
To run this example, we have made a bash job "job.slurm" stored in the folder "/Jobs", and which can be used to run on an HPC system.
To view the output data generated from the profiling process, one needs to install TensorBord, which can be done for instance in a virtual environment
-
Step0: load a python model, create and activate Virt. Env.
-
Find a python module: $module avail python
-
Load a python module .e.g.:
module load python/3.9.6-GCCcore-11.2.0
-
mkdir Myenv
-
python –m venv Myenv
-
source Myenv/bin/activate
-
Step1: Install TensorBoard Plugi via pip wheel packages using the following command (see also here):
-
python –m pip install torch_tb_profiler
-
Step 2: Running tensorboard uisng the command:
tensorboard --logdir=./out --bind_all
will generate a local address having a specific registered or private port. Note that in HPC systems, a direct navigation to the generated address is blocked by firewalls. Therefore, connecting on a internal network from outside can be done via a mechanism called local port forwarding. As stated in the SSH documentation “Local forwarding is used to forward a port from the client machine to the server machine”.
The syntax for local forwarding, which is configured using the option –L
, can be written as, e.g.:
ssh -L 6009:local.host:6006 [email protected]
This syntax enables opening a connection to the jump server [email protected]
, and forwarding
any connection to port 6009 on the local machine to port 6006 on the server [email protected]
.
Last the local address http://localhost:6009/
can be view in a chrome of firefox browser.