MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model

Official implement for MimicDiffusion with Pytorch

Kaiyu Song, Hanjiang Lai, Yan Pan, Jian Yin
arXiv technical report (arXiv 2312.04802)

Todo

release the rest code for CIFAR-10 based on the adjoint method
release the code for ImageNet based on the surrogate process
release the code for CIFAR-10 based on the surrogate process using EDM
release the code for CIFAR-10 based on the surrogate process using SDE

Abstract

Deep neural networks (DNNs) are vulnerable to adversarial perturbation, where an imperceptible perturbation is added to the image that can fool the DNNs. Diffusion-based adversarial purification uses the diffusion model to generate a clean image against such adversarial attacks. Unfortunately, the generative process of the diffusion model is also inevitably affected by adversarial perturbation since the diffusion model is also a deep neural network where its input has adversarial perturbation. In this work, we propose MimicDiffusion, a new diffusion-based adversarial purification technique that directly approximates the generative process of the diffusion model with the clean image as input. Concretely, we analyze the differences between the guided terms using the clean image and the adversarial sample. After that, we first implement MimicDiffusion based on Manhattan distance. Then, we propose two guidance to purify the adversarial perturbation and approximate the clean diffusion model. Extensive experiments on three image datasets, including CIFAR-10, CIFAR-100, and ImageNet, with three classifier backbones including WideResNet-70-16, WideResNet-28-10, and ResNet-50 demonstrate that MimicDiffusion significantly performs better than the state-of-the-art baselines. On CIFAR-10, CIFAR-100, and ImageNet, it achieves 92.67%, 61.35%, and 61.53% average robust accuracy, which are 18.49%, 13.23%, and 17.64% higher, respectively. The code is available at https://github.com/psky1111/MimicDiffusion.

Introduction

This repo currently is based on Surrogate Process. We mainly provide the following functionality:

The purification method of MimicDiffusion.

Here are some important concepts related to our project:

The type of gradient method for adversarial attack: The gradient for the diffusion model is hard to calculate. Diffpure proposed the adjoint method to estimate the gradient. Surrogate process proposed by Lee is an improved method to calculate the gradient of the diffusion model. We highly recommend using surrogate process to implement the white-box adversarial attack since it can get more precise gradient.
Objective: Our primary goal is to generate the clean image to improve the accuracy of the classifier given an adversarial sample as the condition.

Regarding the code structure:

Our method mainly is in purification.py

Pre-trained Models

The pre-trained diffusion model:

[256x256_diffusion_uncond.pt] for ImageNet (https://openaipublic.blob.core.windows.net/diffusion/jul-2021/256x256_diffusion_uncond.pt) from guided-diffusion.
Score SDE for CIFAR-10: (vp/cifar10_ddpmpp_deep_continuous: download link).
EDM for CIFAR-10.
EDM for CIFAR-100. We trained an EDM for CIFAR-100 using the script of the official EDM repo using the EDM checkpoint trained in CIFAR-10.

The pre-trained classifier:

WideResNet-28-10: The pre-trained model is directly from RobustBench
WideResNet-70-16: We trained a WideResNet-70-16 based on the repo
ResNet-50: The pre-trained model is directly from Pytorch

Usage

bash ./script/cifar10.sh

Results

Classifier	Attack Method	Gradient Method	Diffusion Model	Standard Acc	Robust Acc
WideResNet-28-10	PGD	Adjoint method	EDM	92.05 $\pm$ 6.02	91.55 $\pm$ 6.84
WideResNet-28-10	PGD	Surrogate Process	SDE	91.41 $\pm$ 1.12	80.86 $\pm$ 1.48

Acknowledgments

Our work is standing on the shoulders of giants. We want to thank the following contributors that our code is based on:

Surrogate process based adversarial attack method: https://github.com/ml-postech/robust-evaluation-of-diffusion-based-purification

Adjoint method based adversarial attack method: https://github.com/NVlabs/DiffPure#requirements

Citation

If our code or models help your work, please cite our paper:

@article{song2023mimicdiffusion,
  title={MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model},
  author={Kaiyu Song, Hanjiang Lai, Yan Pan, and Jian Yin},
    year={2023},
    eprint={2312.04802},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
__pycache__		__pycache__
attacks		attacks
diffusion_configs		diffusion_configs
guided_diffusion		guided_diffusion
score_sde		score_sde
script		script
README.md		README.md
load_data.py		load_data.py
load_model.py		load_model.py
path.py		path.py
purification.py		purification.py
requirements.txt		requirements.txt
sde_lib.py		sde_lib.py
test.py		test.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model

Todo

Abstract

Introduction

Pre-trained Models

Usage

Results

Acknowledgments

Citation

About

Releases

Packages

Languages

psky1111/MimicDiffusion

Folders and files

Latest commit

History

Repository files navigation

MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model

Todo

Abstract

Introduction

Pre-trained Models

Usage

Results

Acknowledgments

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages