Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
pp_mobileseg_base_ade20k_512x512_80k.yml	pp_mobileseg_base_ade20k_512x512_80k.yml
pp_mobileseg_base_cityscapes_1024x512_80k.yml	pp_mobileseg_base_cityscapes_1024x512_80k.yml
pp_mobileseg_tiny_ade20k_512x512_80k.yml	pp_mobileseg_tiny_ade20k_512x512_80k.yml
pp_mobileseg_tiny_cityscapes_1024x512_80k.yml	pp_mobileseg_tiny_cityscapes_1024x512_80k.yml

PP-MobileSeg: Exploring Transformer Blocks for Efficient Mobile Segmentation.

Reference

Shiyu Tang, Ting Sun, Juncai Peng, Guowei Chen, Yuying Hao, Manhui Lin, Zhihong Xiao, Jiangbin You, Yi Liu. PP-MobileSeg: Explore the Fast and Accurate Semantic Segmentation Model on Mobile Devices. https://arxiv.org/abs/2304.05152

Overview
Performance
Reproduction

Overview

With the success of transformers in computer vision, several attempts have been made to adapt transformers to mobile devices. However, their performance is not satisfied for some real world applications. Therefore, we propose PP-MobileSeg, a SOTA semantic segmentation model for mobile devices.

It is composed of three newly proposed parts, the strideformer backbone, the Aggregated Attention Module(AAM), and the Valid Interpolate Module(VIM):

With the four-stage MobileNetV3 block as the feature extractor, we manage to extract rich local features of different receptive fields with little parameter overhead. Also, we further efficiently empower features from the last two stages with the global view using strided sea attention.
To effectively fuse the features, we use AAM to filter the detail features with ensemble voting and add the semantic feature to it to enhance the semantic information to the most content.
At last, we use VIM to upsample the downsampled feature to the original resolution and significantly decrease latency in model inference stage. It only interpolates classes present in the final prediction which only takes around 10% in the ADE20K dataset. This is a common scenario for datasets with large classes. Therefore it significantly decreases the latency of the final upsample process which takes the greatest part of the model's overall latency.

Extensive experiments show that PP-MobileSeg achieves a superior params-accuracy-latency tradeoff compared to other SOTA methods.

Performance

ADE20K

Model	Backbone	Training Iters	Batchsize	Train Resolution	mIoU(%)	latency(ms)*	params(M)	Links
PP-MobileSeg-Base	StrideFormer-Base	80000	32	512x512	41.57%	265.5	5.62	config\|model\|log\|vdl\|exported model
PP-MobileSeg-Tiny	StrideFormer-Tiny	80000	32	512x512	36.39%	215.3	1.61	config\|model\|log\|vdl\|exported model

Compare with SOTA on ADE20Ks

Model	Backbone	mIoU(%)	latency(ms)*	params(M)
LR-ASPP	MobileNetV3_large_x1_0	33.10	730.9	3.20
MobileSeg-Base	MobileNetV3_large_x1_0	33.26	391.5	2.85
TopFormer-Tiny	TopTransformer-Tiny	32.46	490.3	1.41
SeaFormer-Tiny	SeaFormer-Tiny	35.00	459.0	1.61
PP-MobileSeg-Tiny	StrideFormer-Tiny	36.39	215.3	1.44
TopFormer-Base	TopTransformer-Base	38.28	480.6	5.13
SeaFormer-Base	SeaFormer-Base	40.07**	465.4	8.64
PP-MobileSeg-Base	StrideFormer-Base	41.57	265.5	5.62

Ablation study of PP-MobileSeg-Base on ADE20K

Model	Backbone	Train Resolution	mIoU(%)	latency(ms)*	params(M)	Links
baseline	Seaformer-Base	512x512	40.00%	465.6	8.27	model\|log\|vdl\|exported model
+VIM	Seaformer-Base	512x512	40.07%	234.6	8.17	model\|log\|vdl\|exported model
+VIM+StrideFormer	StrideFormer-Base	512x512	40.98%	235.1	5.54	model\|log\|vdl\|exported model
+VIM+StrideFormer+AAM	StrideFormer-Base	512x512	41.57%	265.5	5.62	model\|log\|vdl\|exported model

* Note that the latency is test with the final argmax operator using PaddleLite on xiaomi9 (Snapdragon 855 CPU) with single thread and 512x512 as input shape. Therefore the output of model is the segment result with single channel rather then probability logits. Inspired by the ineffectiveness of the final argmax operator that greatly increase the overall latency, we designed VIM to significantly decrease the latency.

** The accuracy is reported based on self-trained reproduced result.

Reproduction

Preparation

Install PaddlePaddle and relative environments based on the installation guide.
Install PaddleSeg based on the reference.
Download the ADE20k dataset and link to PaddleSeg/data, or you can directly run the training script. The dataset will be automatically downloaded.

PaddleSeg/data
├── ADEChallengeData2016
│   ├── ade20k_150_embedding_42.npy
│   ├── annotations
│   ├── annotations_detectron2
│   ├── images
│   ├── objectInfo150.txt
│   └── sceneCategories.txt

Training

You can start training by assign the tools/train.py with config files, the config files are under PaddleSeg/configs/pp_mobileseg. Details about training are under training guide. You can find the trained models under Paddleseg/save/dir/best_model/model.pdparams

export CUDA_VISIBLE_DEVICES=0,1

python3  -m paddle.distributed.launch tools/train.py \
    --config configs/pp_mobileseg/pp_mobileseg_base_ade20k_512x512_80k.yml \
    --save_dir output/pp_mobileseg_base \
    --save_interval 1000 \
    --num_workers 4 \
    --log_iters 100 \
    --use_ema \
    --do_eval \
    --use_vdl

Validation

With the trained model on hand, you can verify the model's accuracy through evaluation. Details about evaluation are under evaluation guide.

python  -m paddle.distributed.launch tools/val.py \
       --config configs/pp_mobileseg/pp_mobileseg_base_ade20k_512x512_80k.yml \
       --model_path output/pp_mobileseg_base/best_model/model.pdparams

Deployment

We deploy the model on mobile devices for inference. To do that, we need to export the model and use PaddleLite to inference on mobile devices. You can also refer to lite deploy guide for details of PaddleLite deployment.

0. Preparation

An android mobile phone with usb debugger mode on and are already linked to your PC.
Install the adb tool.

Run the following command to make sure you are ready:

adb devices
# The following information will show if you are good to go:
List of devices attached
017QXM19C1000664    device

1. Model exportation

The model needs to be transferred from dynamic graph to static graph for PaddleLite inference. In this step, we can use VIM to speed the model up. You only need to change model::upsample to vim in the config file, and the exported model can be found on the PaddleSeg/save/dir

python tools/export.py \
      --config configs/pp_mobileseg/pp_mobileseg_base_ade20k_512x512_80k.yml \
      --save_dir output/pp_mobileseg_base  \
      --input_shape 1 3 512 512 \ # The model is set to infer one image with this input shape, feel free to suit this to your dataset.
      --output_op none   # If do not use VIM, you need to set this to argmax to get the final prediction rather than logits.

2. Model inference

After the model is exported, you can download all the exported files and tool zipfile as shown in the following file tree.

Speed_test_dir
├── models_dir
│   ├── pp_mobileseg_base  # Files under this directory is generated through exportation
│   │   ├── model.pdmodel
│   │   ├── mdoel.pdiparams
│   │   ├── model.pdiparams.info
│   │   └── deploy.yaml
│   ├── pp_mobileseg_tiny
│   │   ├── model.pdmodel
│   │   ├── mdoel.pdiparams
│   │   ├── model.pdiparams.info
│   │   └── deploy.yaml
├── benchmark_bin   # The complied testscript of PaddleLite, which is in the tool zipfile.
├── image1.txt      # The txt file that stores the value of resized and normalized image
└── gen_val_txt.py  # You can use this script to generate the image1.txt for your test image

And you can test the speed of the model using the following script. The tested result will be shown in the test_result.txt.

sh benchmark.sh benchmark_bin models_dir test_result.txt image1.txt

The test result on our PP-MobileSeg-Base is as following:

-----------------Model=MV3_4stage_AAMSx8_valid_0321 Threads=1-------------------------
Delete previous optimized model: /data/local/tmp/seg_benchmark/models_0321/MV3_4stage_AAMSx8_valid_0321/opt.nb

---------- Opt Info ----------
Load paddle model from /data/local/tmp/seg_benchmark/models_0321/MV3_4stage_AAMSx8_valid_0321/model.pdmodel and /data/local/tmp/seg_benchmark/models_0321/MV3_4stage_AAMSx8_valid_0321/model.pdiparams
Save optimized model to /data/local/tmp/seg_benchmark/models_0321/MV3_4stage_AAMSx8_valid_0321/opt.nb

---------- Device Info ----------
Brand: Xiaomi
Device: cepheus
Model: MI 9
Android Version: 9
Android API Level: 28

---------- Model Info ----------
optimized_model_file: /data/local/tmp/seg_benchmark/models_0321/MV3_4stage_AAMSx8_valid_0321/opt.nb
input_data_path: /data/local/tmp/seg_benchmark/image1_norm.txt
input_shape: 1,3,512,512
output tensor num: 1
--- output tensor 0 ---
output shape(NCHW): 1 512 512
output tensor 0 elem num: 262144
output tensor 0 mean value: 1.18468e-44
output tensor 0 standard deviation: 2.52949e-44

---------- Runtime Info ----------
benchmark_bin version: e79b4b6
threads: 1
power_mode: 0
warmup: 20
repeats: 50
result_path:

---------- Backend Info ----------
backend: arm
cpu precision: fp32

---------- Perf Info ----------
Time(unit: ms):
init  = 33.071  
first = 314.619  
min   = 265.450  
max   = 271.217  
avg   = 267.246

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pp_mobileseg

pp_mobileseg

README.md

PP-MobileSeg: Exploring Transformer Blocks for Efficient Mobile Segmentation.

Reference

Contents

Overview

Performance

ADE20K

Compare with SOTA on ADE20Ks

Ablation study of PP-MobileSeg-Base on ADE20K

Reproduction

Preparation

Training

Validation

Deployment

0. Preparation

1. Model exportation

2. Model inference

Files

pp_mobileseg

Directory actions

More options

Directory actions

More options

Latest commit

History

pp_mobileseg

Folders and files

parent directory

README.md

PP-MobileSeg: Exploring Transformer Blocks for Efficient Mobile Segmentation.

Reference

Contents

Overview

Performance

ADE20K

Compare with SOTA on ADE20Ks

Ablation study of PP-MobileSeg-Base on ADE20K

Reproduction

Preparation

Training

Validation

Deployment

0. Preparation

1. Model exportation

2. Model inference