PyTorch re-implementation of Real-time Scene Text Detection with Differentiable Binarization
-
Use dice loss instead of BCE(binary cross-entropy) loss.
-
Use normal convolution rather than deformable convolution in the backbone network.
-
The architecture of the backbone network is a simple FPN.
-
Have not implement OHEM.
-
The ground truth of the threshold map is constant 1 rather than 'the distance to the closest segment'.
thanks to these project:
The features are summarized blow:
- Use resnet18/resnet50/shufflenetV2 as backbone.
- pytorch 1.1.0
- ShuffleNet_V2 Models trained on ICDAR 2013+2015 (training set)
https://pan.baidu.com/s/1Um0wzbTFjJC0jdJ703GR7Q
or https://mega.nz/#!WdhxXAxT!oGURvmbQFqTHu5hljUPdbDMzI75_UO2iWLaXX5dJrDw
-
modify genText.py to generate txt list file for training/testing data
-
modify config.json
-
run
python train.py
- run
python predict.py
run
python eval.py
-
MobileNet backbone
-
Deformable convolution
-
tensorboard support
-
FPN --> Architecture in the thesis
-
Dice Loss --> BCE Loss
-
threshold map gt use 1 --> threshold map gt use distance (Use 1 will accelerate the label generation)
-
OHEM
-
OpenCV_DNN inference API for CPU machine
-
Caffe version (for deploying with MNN/NCNN)
-
ICDAR13 / ICDAR15 / CTW1500 / MLT2017 / Total-Text