Yukai Shi, Jianan Wang, He Cao, Boshi Tang, Xianbiao Qi, Tianyu Yang, Yukun Huang, Shilong Liu, Lei Zhang, Heung-Yeung Shum
Official implementation for TOSS: High-quality Text-guided Novel View Synthesis from a Single Image.
TOSS introduces text as high-level sementic information to constraint the NVS solution space for more controllable and more plausible results.
Project Page | ArXiv | Weights
3d_generation_video.mp4
conda create -n toss python=3.9
conda activate toss
pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
git clone https://github.com/openai/CLIP.git
pip install -e CLIP/
Download pretrain weights from this link to sub-directory ./ckpt
We suggest gradio for a visualized inference and test this demo on a single RTX3090.
python app.py
- Release inference code.
- Release pretrained models.
- Upload 3D generation code.
- Upload training code.
@article{shi2023toss,
title={Toss: High-quality text-guided novel view synthesis from a single image},
author={Shi, Yukai and Wang, Jianan and Cao, He and Tang, Boshi and Qi, Xianbiao and Yang, Tianyu and Huang, Yukun and Liu, Shilong and Zhang, Lei and Shum, Heung-Yeung},
journal={arXiv preprint arXiv:2310.10644},
year={2023}
}