January 2020
tl;dr: Regress distance via uniform log space and ordinal regression.
Ordinal regression is like cls, but the order matters. Directly regressing the numbers leads to slow convergence and worse performance. Ordinal regression has been used to estimate human age.
Ordinal regression converts the convetional one-hot multi-class cls (MCC) into a series of binary classification.
The idea was also corroborated in other studies such as VectorMapNet.
- The backbone is a conventional one that yields dense features.
- On dense features, ASPP (atrous spatial pyramid pooling) is used to generate multi-scale features but with the same resolution.
- Full image encoder is generated by yielding a 1x1xC vector then copy to everywhere on the image. This helps clarify local confusion in depth estimation.
- 1x1 convs to learn cross channel information.
- Space increasing discretization (SID)
- uniformly spaced in log space. Essentially it is geometric progression (等比数列).
- This seems to be a simplified version of mu-law (summary in Chinese).
- This is improved by LID in Center3D.
- Main takeaways
- SID Depth bins > UD (uniform bins)
- Discretized depth bins > direct regression
- Ordinal loss helps, even better than berHu (inverted smooth L1, or Huber loss)
- MSE-SID and MSE on continuous target almost get the same results, meaning the quantization is almost negligible in depth estimation task.
- Kitti used 80 bins.
- The network is trained on crops. At test time, the entire image is split into crops with overlapping region.
- ordinal loss implementation in pytorch