ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment

GitHub arXiv HuggingFace

Hao Yang1, Yifan Ji1, Zhipeng Xu1, Zhenghao Liu1, Yukun Yan2, Zulong Chen3, Shuo Wang2, Yu Gu1, Ge Yu1

1Northeastern University, 2Tsinghua University, 3Alibaba Group

Overview

We introduce Reasoning-Guided Alignment (ReAlign), a method that enhances visual document retrieval by leveraging the reasoning capability of VLMs to provide fine-grained visual document descriptions as supervision signals for training. The framework supports multiple multimodal backbone models including Phi3 Vision and Qwen2.5 VL.

Our work is accepted by SIGIR 2026 🎉🎉🎉!

If you find this model useful, please give us a like❤.

Collections

We have made the following resources available on 🤗ReAlign collection.

Resource Description Link
ReAlign-Phi3v The visual document retriever based on Phi-3-vision-128k-instruct 🤗ReAlign-Phi3v
ReAlign-Qwen The visual document retriever based on Qwen2.5-VL-7B-Instruct 🤗ReAlign-Qwen
Training Data The data used to train the ReAlign retriever 🤗ReAlign-Trainset

Setup

(1) Clone this repository:

git clone git@github.com:NEUIR/ReAlign.git
cd ReAlign

(2) Create and activate a Conda environment (Python 3.10):

conda create -n realign python=3.10 -y
conda activate realign

(3) Install dependencies and the editable package:

pip install -r requirements.txt
pip install -e .

Training

1. Prepare Data and Model Paths

All absolute paths for data and model checkpoints are centralized in config/dir_config.sh. Please download the required assets and set the paths according to the instructions in that file.

vim config/dir_config.sh

2. Create Log Directory

mkdir -p log

3. Run Training

Phi3 Vision:

bash sh/train_phi3v.sh > log/realign-phi3v.log 2>&1

Qwen2.5 VL:

bash sh/train_qwen.sh > log/realign-qwen.log 2>&1

Evaluation

The second argument of each evaluation script is a comma-separated list of GPU IDs. The examples below use four GPUs; adjust to match your hardware (e.g., use 0 for a single GPU).

Phi3 Vision:

bash sh/eval.sh realign-phi3v 0,1,2,3

Qwen2.5 VL:

bash sh/eval_qwen.sh realign-qwen 0,1,2,3

Citation

@article{yang2025realign,
      title={ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment},
      author={Yang, Hao and Ji, Yifan and Xu, Zhipeng and Liu, Zhenghao and Yan, Yukun and Chen, Zulong and Wang, Shuo and Gu, Yu and Yu, Ge},
      year={2026}
      url={https://arxiv.org/abs/2604.xxxxx}, 
}

Contact

If you have questions, suggestions, and bug reports, please email:

yanghao123@mails.neu.edu.cn
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yanghaoir/ReAlign-Qwen

Finetuned
(1032)
this model

Dataset used to train yanghaoir/ReAlign-Qwen

Collection including yanghaoir/ReAlign-Qwen