ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment
Hao Yang1, Yifan Ji1, Zhipeng Xu1, Zhenghao Liu1, Yukun Yan2, Zulong Chen3, Shuo Wang2, Yu Gu1, Ge Yu1
1Northeastern University, 2Tsinghua University, 3Alibaba Group
• Overview • Collections • Setup • Training • Evaluation • Citation • Contact
Overview
We introduce Reasoning-Guided Alignment (ReAlign), a method that enhances visual document retrieval by leveraging the reasoning capability of VLMs to provide fine-grained visual document descriptions as supervision signals for training. The framework supports multiple multimodal backbone models including Phi3 Vision and Qwen2.5 VL.
Our work is accepted by SIGIR 2026 🎉🎉🎉!
If you find this model useful, please give us a like❤.
Collections
We have made the following resources available on 🤗ReAlign collection.
| Resource | Description | Link |
|---|---|---|
| ReAlign-Phi3v | The visual document retriever based on Phi-3-vision-128k-instruct | 🤗ReAlign-Phi3v |
| ReAlign-Qwen | The visual document retriever based on Qwen2.5-VL-7B-Instruct | 🤗ReAlign-Qwen |
| Training Data | The data used to train the ReAlign retriever | 🤗ReAlign-Trainset |
Setup
(1) Clone this repository:
git clone git@github.com:NEUIR/ReAlign.git
cd ReAlign
(2) Create and activate a Conda environment (Python 3.10):
conda create -n realign python=3.10 -y
conda activate realign
(3) Install dependencies and the editable package:
pip install -r requirements.txt
pip install -e .
Training
1. Prepare Data and Model Paths
All absolute paths for data and model checkpoints are centralized in config/dir_config.sh. Please download the required assets and set the paths according to the instructions in that file.
vim config/dir_config.sh
2. Create Log Directory
mkdir -p log
3. Run Training
Phi3 Vision:
bash sh/train_phi3v.sh > log/realign-phi3v.log 2>&1
Qwen2.5 VL:
bash sh/train_qwen.sh > log/realign-qwen.log 2>&1
Evaluation
The second argument of each evaluation script is a comma-separated list of GPU IDs. The examples below use four GPUs; adjust to match your hardware (e.g., use 0 for a single GPU).
Phi3 Vision:
bash sh/eval.sh realign-phi3v 0,1,2,3
Qwen2.5 VL:
bash sh/eval_qwen.sh realign-qwen 0,1,2,3
Citation
@article{yang2025realign,
title={ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment},
author={Yang, Hao and Ji, Yifan and Xu, Zhipeng and Liu, Zhenghao and Yan, Yukun and Chen, Zulong and Wang, Shuo and Gu, Yu and Yu, Ge},
year={2026}
url={https://arxiv.org/abs/2604.xxxxx},
}
Contact
If you have questions, suggestions, and bug reports, please email:
yanghao123@mails.neu.edu.cn
Model tree for yanghaoir/ReAlign-Qwen
Base model
Qwen/Qwen2.5-VL-7B-Instruct