Expert-Controlled Classifier-Free Guidance is a training-free expert-in-the-loop framework designed to align MedVLM with clinical expertise. It integrates token-level uncertainty estimation, a BioMedCLIP-based medical multimodal Retrieval-Augmented Generation (RAG), and interactive expert revisions and highlight-based guidance.
conda create -n expert_cfg python=3.10 -y
conda activate expert_cfg
pip install -r requirements.txt
Download them to the current directory separately and merge them with Phi-3-vision-128k-instruct
and Phi-3.5-vision-instruct
respectively.
- Phi-3V: Huggingface
- Phi-3.5V: Huggingface
Our fine-tuning Phi3V-Med and Phi3.5V-Med LoRA links (removed to comply with double-blind requirements):
- Phi-3V-Med: [Huggingface]
- Phi-3.5V-Med: [Huggingface]
Download them to the
./lora_weights
folder
torchrun --nproc_per_node=1 demo.py \
--bf16 \
--use_lora \
--input_json 'examples/input_queries.json' \
--img_root 'examples/images' \
--save_path 'examples/results.json' \
--output_dir './lora_weights/logs_phi35_pubmed_instruct'
Download BiomedCLIP and place it in ./src/backbone/BiomedCLIP
.
BiomedCLIP links:
- Huggingface
- [Baiduyun]
Note: Directly downloading weights from Huggingface might encounter network issues. To facilitate modifications, we have converted the original .bin
file to PyTorch's .pth
. We recommend using the Baiduyun version.
Our data mainly comes from publicly available, free online Pathology Education Informational Resource (PEIR) Digital Library. We test our model on:
Medical Alignment and Instruction Tuning:
Note: We recommend using our pre-extracted BioMedCLIP features. The original images can also be found in the links below:
Dataset | Pre-extracted Features & Original Images |
---|---|
PEIR | Remove |
PathVQA | Remove |
Slake | Remove |
RADVQA | Remove |
We also reference the excellent repos of Phi-3CookBook, HuatuoVision, BioMedCLIP, in addition to other specific repos to the baseline and dataset we examined (see paper).
If you find this paper useful, please consider staring 🌟 this repo and citing 📑 our paper: