File size: 1,265 Bytes

---
license: mit
---

# Probing Visual Language Priors in VLMs


## ImageDPO Finetuned Model  

This page provides the **ImageDPO** finetuned checkpoint for LLaVA-v1.5-7B used in [Probing Visual Language Priors in VLMs](https://arxiv.org/abs/2501.00569). ImageDPO is a self-improving approach to enhance VLM visual reasoning performance by increasing reliance on visual inputs as illustrated in the below image. We offer the **merged model weights** for use.  

![ImageDPO](https://huggingface.co/ViLP/LLaVA-v1.5-13b-ImageDPO/resolve/main/ImageDPO.png)

## Usage  

First, install the [LLaVA-v1.5 codebase](https://github.com/LLaVA-VL/LLaVA-Plus-Codebase).    

Run the following command to have a try:

```bash
python -m llava.eval.run_llava \
    --model-path ViLP/LLaVA-v1.5-7b-ImageDPO \
    --image-file 'images/llava_logo.png' \
    --query 'Please caption this image.' \
    --conv-mode llava_v1
```


## Citation Information

Please consider citing ***ViLP*** paper, if you find our resource helpful!

```bibtex
@article{luo2024probing,
      title={Probing Visual Language Priors in VLMs},
      author={Luo, Tiange and Cao, Ang and Lee, Gunhee and Johnson, Justin and Lee, Honglak},
      journal={arXiv preprint arXiv:2501.00569},
      year={2024}
}
```