File size: 1,265 Bytes
0eda225 22e3b96 0eda225 0aa0226 0eda225 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
---
license: mit
---
# Probing Visual Language Priors in VLMs
## ImageDPO Finetuned Model
This page provides the **ImageDPO** finetuned checkpoint for LLaVA-v1.5-7B used in [Probing Visual Language Priors in VLMs](https://arxiv.org/abs/2501.00569). ImageDPO is a self-improving approach to enhance VLM visual reasoning performance by increasing reliance on visual inputs as illustrated in the below image. We offer the **merged model weights** for use.

## Usage
First, install the [LLaVA-v1.5 codebase](https://github.com/LLaVA-VL/LLaVA-Plus-Codebase).
Run the following command to have a try:
```bash
python -m llava.eval.run_llava \
--model-path ViLP/LLaVA-v1.5-7b-ImageDPO \
--image-file 'images/llava_logo.png' \
--query 'Please caption this image.' \
--conv-mode llava_v1
```
## Citation Information
Please consider citing ***ViLP*** paper, if you find our resource helpful!
```bibtex
@article{luo2024probing,
title={Probing Visual Language Priors in VLMs},
author={Luo, Tiange and Cao, Ang and Lee, Gunhee and Johnson, Justin and Lee, Honglak},
journal={arXiv preprint arXiv:2501.00569},
year={2024}
}
``` |