This is a repository for organizing papres related to modality priors of Multimodal Large Language Models (MLLM).
Modality priors in multimodal large language models (MLLMs) include visual priors, language priors, etc., which refer to inherent biases or preconceptions embedded in components such as the visual encoder and language model of MLLMs. These priors come from the text data on which visual pre-training and language model training are based, and affect how the model combines other modalities to interpret and generate language. They affect the model's predictions, leading to potential biases or expectations about the relationship between different types of data in a multimodal context.
(13 Jun 2024) VLind-Bench: Measuring Language Priors in Large Vision-Language Models
(23 May 2023) IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions
(13 Mar 2023) Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images
(30 Oct 2023) ROME: Evaluating Pre-trained Vision-Language Models on Reasoning beyond Visual Common Sense
(16 Oct 2024) The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
(5 Jul 2023) Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks
(24 Jun 2024) Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
(19 Jun 2024) VACoDe: Visual Augmented Contrastive Decoding
(17 Jun 2024) mDPO: Conditional Preference Optimization for Multimodal Large Language Models
(6 Apr 2024) Context versus Prior Knowledge in Language Models
(27 Mar 2024) Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding
(8 Mar 2024) Debiasing Multimodal Large Language Models
(28 Nov 2023) Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
(26 Jun 2023) Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
(4 Jun 2024) Eliciting the Priors of Large Language Models using Iterated In-Context Learning
(25 Mar 2024) The Strong Pull of Prior Knowledge in Large Language Models and Its Impact on Emotion Recognition
(11 August 2023) Robust visual question answering via polarity enhancement and contrast
(1 Dec 2017)Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
(CVPR 2016) Yin and Yang: Balancing and Answering Binary Visual Questions
(CVPR 2017) Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
(CVPR 2018) Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
(NIPS 2018) Overcoming Language Priors in Visual Question Answering with Adversarial Regularization
(SIGIR 2019) Quantifying and Alleviating the Language Prior Problem in Visual Question Answering
(CVPR 2021) Counterfactual VQA: A Cause-Effect Look at Language Bias
(TIP 2021) Loss Re-Scaling VQA: Revisiting the Language Prior Problem From a Class-Imbalance View
(EMNLP 2022) Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA
(COLING 2022) Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances
(JMLR 2023) Overcoming Language Priors for Visual Question Answering via Loss Rebalancing Label and Global Context