This is the code implementation for the paper Improving Interpretation Faithfulness for Vision Transformers published at ICML 2024 Spotlight.
Please set up the environment following env.yaml
by using conda. After that, please clone https://github.com/openai/guided-diffusion and put it in the same directory as this repo. Note that we leverage the pre trained diffusion model of 256x256 diffusion (not class conditional): 256x256_diffusion_uncond.pt
.
We provide some examples in fig
folder. You can run the corresponding experiments to see different explanation results in the fvit-demo.ipynb
notebook.
Our code implementation is based on the following awesome material:
- https://github.com/jacobgil/vit-explain
- https://jacobgil.github.io/deeplearning/vision-transformer-explainability
- https://arxiv.org/abs/2005.00928
- https://github.com/hila-chefer/Transformer-Explainability
- https://github.com/openai/guided-diffusion
All rights belong to the authors. Suggest cite:
@inproceedings{huimproving,
title={Improving Interpretation Faithfulness for Vision Transformers},
author={Hu, Lijie and Liu, Yixin and Liu, Ninghao and Huai, Mengdi and Sun, Lichao and Wang, Di},
booktitle={Forty-first International Conference on Machine Learning}
}