Official Code for ReRe:Retrieval-Augmented Natural Language Reasoning For Explainable Visual Question Answering
(Another repo).
Accepted from IEEE ICIP 2024 workshop: Integrating Image Processing with Large-Scale Language/Vision Models for Advanced Visual Understanding
- PyTorch 2.1.2
- CLIP(install with
pip install git+https://github.com/openai/CLIP.git
) - transformers(install with
pip install transformers
) - accelerate==0.26.1
- evaluate==0.4.1
- torchvision==0.16.2
- torchmetrics==1.3.0
Download the images in /local_datasets/vqax
- image for VQA-X: COCO
train2014
andval2014
images
Download GPT-2 distilled model on '/local_datasets/vqax/'. This model is pretrained on image captioning model.
- Model and Tokenizer are in drive
Train the model using command line below. result of model will be saved in 'result' folder in every epoch.
python ReRe.py
For evaluate the result of ReRe, We are using Cider, Bleu, Meteor, Rouge, Bertscore to measure the quality of model output explanations. This metrics are widely used metric in NLE Task. For Accuracy, answer's correct or wrong is counted if output answer is in GT answers. To see finetuned model's score, Run the command line below.
python evaluation.py