Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark

Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark Paper in arXiv
The 12Hz nuScenes-H dataset

12Hz-nuScene-H.mp4

Abstract

In recent years, vision-centric perception has flourished in various autonomous driving tasks, including 3D detection, semantic map construction, motion forecasting, and depth estimation. Nevertheless, the latency of vision-centric approaches is too high for practical deployment (e.g., most camera-based 3D detectors have a runtime greater than 300ms). To bridge the gap between ideal research and real-world applications, it is necessary to quantify the trade-off between performance and efficiency. Traditionally, autonomous-driving perception benchmarks perform the offline evaluation, neglecting the inference time delay. To mitigate the problem, we propose the Autonomous-driving StreAming Perception (ASAP) benchmark, which is the first benchmark to evaluate the online performance of vision-centric perception in autonomous driving. On the basis of the 2Hz annotated nuScenes dataset, we first propose an annotation-extending pipeline to generate high-frame-rate labels for the 12Hz raw images. Referring to the practical deployment, the Streaming Perception Under constRained-computation (SPUR) evaluation protocol is further constructed, where the 12Hz inputs are utilized for streaming evaluation under the constraints of different computational resources. In the ASAP benchmark, comprehensive experiment results reveal that the model rank alters under different constraints, suggesting that the model latency and computation budget should be considered as design choices to optimize the practical deployment. To facilitate further research, we establish baselines for camera-based streaming 3D detection, which consistently enhance the streaming performance across various hardware.

Streaming perception results

Getting Started

Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{wang2022asap,
  title={Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark},
  author={Wang, Xiaofeng and Zhu, Zheng and Zhang, Yunpeng and Huang, Guan and Ye, Yun and Xu, Wenbo and Chen, Ziwei and Wang, Xingang}
  journal={arXiv preprint arXiv:2212.08914},
  year={2022}
}

Acknowledgement

Many thanks to these excellent projects:

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
docs		docs
model_rst/FCOS3D		model_rst/FCOS3D
render		render
sAP3D		sAP3D
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark

Abstract

Getting Started

Bibtex

Acknowledgement

About

Releases

Packages

Languages

License

JeffWang987/ASAP

Folders and files

Latest commit

History

Repository files navigation

Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark

Abstract

Getting Started

Bibtex

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages