URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding
Yongxin Shi, Jiapeng Wang, Zeyu Shan, Dezhi Peng, Zening Lin, Lianwen Jin
URaG
URaG (Unified Retrieval and Generation) is a simple-yet-effective unified framework that unifies retrieval and generation in a model for efficient long document understanding. Equipped with a lightweight cross-modal retrieval module, URaG explicitly leverages the inherent evidence localization capabilities of MLLMs to perform efficient and integrated retrieval.
URaG-3B is based on the Qwen2.5-VL-3B.
Environment
conda create -n urag python=3.10
conda activate urag
Install torch & flash-attn:
# Recommend version (not mandatory)
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip install -U flash-attn==2.7.3 --no-build-isolation
Install other dependencies:
git clone https://github.com/shi-yx/URaG.git
cd ./URaG
pip install -r requirements.txt
Inference
We provide an example of inference code in the Github repo.
Train
Prepare Training Datasets
The dataset format is as follows (Please refer to the Github repo for more details):
[
{
"id": "unique_id",
"image": ["image_path1", "image_path2", ...],
"conversations": [
{"from": "human", "value": "query"},
{"from": "gpt", "value": "answer"},
]
"retrieval_labels": [0, 1, 0, ...], # 1: evidence, 0: non-evidence
},
...
]
Pretrain
cd ./URaG/code
sh scripts/pretrain.sh
# extract the parameters of the proj_layer
sh scripts/extract_projlayer.sh
Finetune
cd ./URaG/code
sh scripts/finetune.sh
# merge lora
sh scripts/merge_lora.sh
Citation
If you find our work helpful, feel free to give us a cite.
@inproceedings{shi2026urag,
title={URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding},
author={Shi, Yongxin and Wang, Jiapeng and Shan, Zeyu and Peng, Dezhi and Lin, Zening and Jin, Lianwen},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2026}
}
- Downloads last month
- 12