Skip to content

ICNet implemented by pytorch, for real-time semantic segmentation on high-resolution images, mIOU=71.0 on cityscapes, single inference time is 19ms, FPS is 52.6.

License

Notifications You must be signed in to change notification settings

TshellT/ICNet-pytorch

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

This repo contains ICNet implemented by PyTorch, based on paper by Hengshuang Zhao, and et. al(ECCV'18). Training and evaluation are done on the Cityscapes dataset by default.

Requirements

Python 3.6 or later with the following pip3 install -r requirements.txt:

  • torch==1.1.0
  • torchsummary==1.5.1
  • torchvision==0.3.0
  • numpy==1.17.0
  • Pillow==6.0.0
  • PyYAML==5.1.2

Updates

Performance

Method mIoU(%) Time(ms) FPS Memory(GB) GPU
ICNet(paper) 67.7% 33ms 30.3 1.6 TitanX
ICNet(ours) 71.0% 19ms 52.6 1.86 GTX 1080Ti
  • Base on Cityscapes dataset, only train on trainning set, and test on validation set, using only one GTX 1080Ti card, and input size of the test phase is 2048x1024x3.
  • For the performance of the original paper, you can query the "Table2" in the paper.

Demo

image predict
src predict
src predict
src predict
src predict
src predict
src predict
src predict
  • All the input images comes from the validation dataset of the Cityscaps, you can switch to the demo/ directory to check more demo results.

Usage

Trainning

First, modify the configuration in the configs/icnet.yaml file:

### 3.Trainning 
train:
  specific_gpu_num: "1"   # for example: "0", "1" or "0, 1"
  train_batch_size: 7    # adjust according to gpu resources
  cityscapes_root: "/home/datalab/ex_disk1/open_dataset/Cityscapes/" 
  ckpt_dir: "./ckpt/"     # ckpt and trainning log will be saved here

Then, run: python3 train.py

Evaluation

First, modify the configuration in the configs/icnet.yaml file:

### 4.Test
test:
  ckpt_path: "./ckpt/icnet_resnet50_197_0.710_best_model.pth"  # set the pretrained model path correctly

Then, run: python3 evaluate.py

Discussion

ICNet The structure of ICNet is mainly composed of sub4, sub2, sub1 and head:

  • sub4: basically a pspnet, the biggest difference is a modified pyramid pooling module.
  • sub2: the first three phases convolutional layers of sub4, sub2 and sub4 share these three phases convolutional layers.
  • sub1: three consecutive stried convolutional layers, to fastly downsample the original large-size input images
  • head: through the CFF module, the outputs of the three cascaded branches( sub4, sub2 and sub1) are connected. Finaly, using 1x1 convolution and interpolation to get the output.

During the training, I found that pyramid pooling module in sub4 is very important. It can significantly improve the performance of the network and lightweight models.

The most import thing in data preprocessing phase is to set the crop_size reasonably, you should set the crop_size as close as possible to the input size of prediction phase, here is my experiment:

  • I set the base_size to 520, it means resize the shorter side of image between 520x0.5 and 520x2, and set the crop size to 480, it means randomly crop 480x480 patch to train. The final best mIoU is 66.7%.
  • I set the base_size to 1024, it means resize the shorter side of image between 1024x0.5 and 1024x2, and set the crop_size to 720, it means randomly crop 720x720 patch to train. The final best mIoU is 69.9%.
  • Beacuse our target dataset is Cityscapes, the image size is 2048x1024, so the larger crop_size(720x720) is better. I have not tried a larger crop_size(such as 960x960 or 1024x1024) yet, beacuse it will result in a very small batch size and is very time-consuming, in addition, the current mIoU is already high. But I believe that larger crop_size will bring higher mIoU.

In addition, I found that a small training technique can improve the performance of the model:

  • set the learning rate of sub4 to orginal initial learning rate(0.01), because it has backbone pretrained weights.
  • set the learning rate of sub1 and head to 10 times initial learning rate(0.1), because there are no pretrained weights for them.

This small training technique is really effective, it can improve the mIoU performance by 1~2 percentage points.

Any other questions or my mistakes can be fedback in the comments section. I will replay as soon as possible.

Reference

About

ICNet implemented by pytorch, for real-time semantic segmentation on high-resolution images, mIOU=71.0 on cityscapes, single inference time is 19ms, FPS is 52.6.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%