by Yuxiang Ji*, Boyong He*, Chenyuan Qu, Zhuoyue Tan, Chuan Qin, Liaoni Wu
Pre-trained diffusion models have demonstrated remarkable proficiency in synthesizing images across a wide range of scenarios with customizable prompts, indicating their effective capacity to capture universal features. Motivated by this, our study delves into the utilization of the implicit knowledge embedded within diffusion models to address challenges in cross-domain semantic segmentation. This paper investigates the approach that leverages the sampling and fusion techniques to harness the features of diffusion models efficiently. We propose DIffusion Feature Fusion (DIFF) as a backbone use for extracting and integrating effective semantic representations through the diffusion process. By leveraging the strength of text-to-image generation capability, we introduce a new training framework designed to implicitly learn posterior knowledge from it.
Relying on diffusion-based encoder, our approach improves the previous state-of-the-art performance by 2.7 mIoU for GTA→Cityscapes, by 4.98 mIoU for GTA→ACDC, by 11.69 mIoU for GTA→Dark Zurich.
For this project, we used python 3.8.18. We recommend setting up a new virtual environment:
python -m venv ~/venv/diff
source ~/venv/diff/bin/activate
In that environment, the requirements can be installed with:
pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.3.7 # requires the other packages to be installed first
Further, please download the Stable-Diffusion v2-1 weights from HuggingFace. Please refer to the instruction at Stable-Diffusion.
All experiments were executed on a NVIDIA RTX A6000.
Cityscapes: Please, download leftImg8bit_trainvaltest.zip and
gt_trainvaltest.zip from here
and extract them to data/cityscapes
.
GTA: Please, download all image and label packages from
here and extract
them to data/gta
.
More details of dataset preparation could be referred at DAFormer.
To run a simple experiment on GTA→Cityscapes
python run_experiments.py --exp 50
More information about the available configuration and experiments, can be found in diff_config.yaml.
If you want to utilize DIFF module as a backbone for other tasks, you could simply copy the whole directory mmseg/models/backbones/diff and use DIFFEncoder
in here.
This project is based on the following open-source projects. We thank their authors for making the source code publically available.
If you find our work useful in your research, please consider citing:
@misc{ji2024diffusion,
title={Diffusion Features to Bridge Domain Gap for Semantic Segmentation},
author={Yuxiang Ji and Boyong He and Chenyuan Qu and Zhuoyue Tan and Chuan Qin and Liaoni Wu},
year={2024},
eprint={2406.00777},
archivePrefix={arXiv},
primaryClass={cs.CV}
}