DeepGlint-AI
/

UniDoc-RL-7B

Model card Files Files and versions

Add model card and metadata

#1

by nielsr HF Staff - opened Apr 18

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +36 -0

README.md ADDED Viewed

	@@ -0,0 +1,36 @@

+---
+library_name: transformers
+pipeline_tag: image-text-to-text
+---
+# UniDoc-RL-7B
+[UniDoc-RL](https://huggingface.co/papers/2604.14967) is a unified reinforcement learning framework for **visual document Retrieval-Augmented Generation (RAG)**, where a Large Vision-Language Model (LVLM) agent jointly performs retrieval, reranking, active visual perception, and reasoning within a single decision process.
+This repository contains the 7B model checkpoint, based on the **Qwen2.5-VL** architecture.
+## Overview
+UniDoc-RL formulates visual information acquisition as a sequential decision-making problem with a hierarchical action space. Specifically, it progressively refines visual evidence from coarse-grained document retrieval to fine-grained image selection and active region cropping, allowing the model to suppress irrelevant content and attend to information-dense regions (such as charts, tables, and dense text).
+Key features include:
+- **Hierarchical Action Space**: The model uses structured actions such as `<search>`, `<select>`, `<bbox>`, and `<answer>`.
+- **Progressive Evidence Acquisition**: Refines evidence from page-level retrieval to fine-grained region inspection.
+- **RL Alignment**: Trained using Group Relative Policy Optimization (GRPO) with a dense multi-reward scheme to align behavior across multiple objectives.
+## Resources
+- **Paper**: [UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards](https://huggingface.co/papers/2604.14967)
+- **Code**: [GitHub - deepglint/UniDoc-RL](https://github.com/deepglint/UniDoc-RL)
+- **Dataset**: [DeepGlint-AI/UniDoc-RL](https://huggingface.co/datasets/DeepGlint-AI/UniDoc-RL)
+## Citation
+```bibtex
+@misc{unidocrl2026,
+      title={UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards},
+      author={Jun Wang and Shuo Tan and Zelong Sun and Tiancheng Gu and Yongle Zhao and Ziyong Feng and Kaicheng Yang and Cewu Lu},
+      year={2026},
+      note={Project page and paper link: https://huggingface.co/papers/2604.14967}
+}
+```