Fully Convolutional Networks for Semantic Segmentation CVPR 2015
Fully Convolutional Network (FCN) was the most classic semantic segmentation network, its formulation is even before the ResNet era. Our implementation of FCN borrows from TorchVision, which is a modern take of FCN with ResNet backbone. The "replacing stride by dilation" option from Deeplab is integrated to maintain a 8x down-sampled feature map that makes prediction through a simple segmentation head. It can still be seen as the most simple segmentation baseline, but it is not the original FCN anymore.
Training time estimated with single 2080 Ti.
ImageNet pre-training, 3-times average/best.
backbone | resolution | training time | precision | mIoU (avg) | mIoU | |
---|---|---|---|---|---|---|
ResNet-101 | 321 x 321 | 3.3h | mix | 70.72 | 70.83 | model | shell |
ResNet-101 | 321 x 321 | 6.3h | full | 70.91 | 71.55 | model | shell |
backbone | resolution | training time | precision | mIoU (avg) | mIoU | |
---|---|---|---|---|---|---|
ResNet-101 | 321 x 321 | 2.2h | mix | 68.05 | 68.20 | model | shell |
FPS is best trial-avg among 3 trials on a 2080 Ti.
backbone | resolution | FPS | FLOPS(G) | Params(M) |
---|---|---|---|---|
ResNet-101 | 256 x 512 | 43.32 | 216.42 | 51.95 |
ResNet-101 | 512 x 1024 | 12.06 | 865.69 | 51.95 |
ResNet-101 | 1024 x 2048 | 3.06 | 3462.77 | 51.95 |
@inproceedings{long2015fully,
title={Fully convolutional networks for semantic segmentation},
author={Long, Jonathan and Shelhamer, Evan and Darrell, Trevor},
booktitle={Computer Vision and Pattern Recognition},
year={2015}
}
@article{shelhamer2016fully,
title={Fully convolutional networks for semantic segmentation},
author={Shelhamer, Evan and Long, Jonathan and Darrell, Trevor},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
volume={39},
number={4},
pages={640--651},
year={2016}
}