DisCo: Diffusion-based Cross-modal Shape Reconstruction

✨ Overview

This repository contains code, models, and demos for a Cross-modal Shape Reconstruction Model called DisCo. Key features:

Utilizes Triplane Diffusion Transformers (Triplane-DiT) for memory-efficient 3D reconstruciotn
Robustly processes multi-view images, adeptly handling real-world challenges such as occlusion and motion blur
Seamlessly integrates point cloud and posed image data and achieve metric-scale 3D reconstructions
Trained on high-quality 3D datasets (LASA, ABO, 3DFRONT, ShapeNet)

Environment Setup

Hardware

We train our model on 8x A100 GPUs with a batch size of 22 per GPU.

Setup environment

The following steps have been tested on Ubuntu20.04. - You must have an NVIDIA graphics card with at least 12GB VRAM and have [CUDA](https://developer.nvidia.com/cuda-downloads) installed. - Install `Python >= 3.8`. - Install `PyTorch==2.3.0` and `torchvision==0.18.0`. ```sh pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu118 pip install torch-scatter -f https://data.pyg.org/whl/torch-2.3.0+cu118.html ```

Install dependencies:

pip install -r requirements.txt

Install DisCo:

pip install -e .

Inference

Prepare Pretrained Weights

Download the pretrained weight from BaiduYun or SharePoint.
Put ae,dm, and finetune_diffusion folder under DisCo/output. Only the ae and finetune_dm is needed for final evaluation:
- The ae folder stores the VAE weight,
- dm folder stores the diffusion model trained on synthetic data.
- finetune_dm folder stores the diffusion model finetuned on LASA dataset.

Data Preparation

Follow these steps to prepare the data for training:

Obtain Training Data
- Follow the instructions in DATA.md to obtain the necessary training data.
Download CLIP Model Weights
- Download the open_clip_pytorch_model.bin file from SharePoint.
- Place the downloaded file in the DisCo/data directory.
- This weight file is used for extracting image features using the ViT model.

Additional Notes

The weight file is specifically for extracting ViT features from images.
Ensure you have the necessary permissions to access the SharePoint link.
If you encounter any issues during the data preparation process, please refer to the project's issue tracker or contact the maintainers.

Train && Evaluation

Train the Triplane-VAE Model

python launch.py --mode train_vae --gpus 0,1,2,3,4,5,6,7 --category chair

Cache Image and Triplane Features

python launch.py  --mode cache_image_features --gpus 0,1,2,3,4,5,6,7 --category chair
python launch.py  --mode cache_triplane_features --gpus 0,1,2,3,4,5,6,7 --category chair

Train the Triplane-Diffusion Model on Synthetic dataset

python launch.py  --mode train_diffusion --gpus 0,1,2,3,4,5,6,7 --category chair

Finetune the Triplane-Diffusion Model on LASA dataset

python launch.py  --mode finetune_diffusion --gpus 0,1,2,3,4,5,6,7 --category chair

Evaluate the Tripalne-Diffusion Model

python launch.py  --mode evaluate --gpus 0,1,2,3,4,5,6,7 --category chair

results will be saved under ./results/

Put Inference results to scene

python launch.py --mode put_resutls_to_scene

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
asset		asset
data		data
disco		disco
output		output
.gitignore		.gitignore
README.md		README.md
demo.py		demo.py
launch.py		launch.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DisCo: Diffusion-based Cross-modal Shape Reconstruction

✨ Overview

Contents

Environment Setup

Inference

Data Preparation

Additional Notes

Train && Evaluation

About

Releases

Packages

Contributors 2

Languages

GAP-LAB-CUHK-SZ/DisCo

Folders and files

Latest commit

History

Repository files navigation

DisCo: Diffusion-based Cross-modal Shape Reconstruction

✨ Overview

Contents

Environment Setup

Inference

Data Preparation

Additional Notes

Train && Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages