Add model card and metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +32 -0
README.md ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ pipeline_tag: image-text-to-text
4
+ ---
5
+
6
+ # UniDoc-RL-7B
7
+
8
+ **UniDoc-RL** is a unified reinforcement learning framework for **visual document RAG**, where an LVLM agent jointly performs retrieval, reranking, active visual perception, and reasoning within a single decision process.
9
+
10
+ This model is the 7B variant of the UniDoc-RL framework, built upon the Qwen2.5-VL architecture.
11
+
12
+ - **Paper:** [UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards](https://huggingface.co/papers/2604.14967)
13
+ - **Repository:** [https://github.com/deepglint/UniDoc-RL](https://github.com/deepglint/UniDoc-RL)
14
+
15
+ ## Overview
16
+
17
+ UniDoc-RL formulates visual evidence acquisition as a hierarchical sequential decision-making problem. The model interacts with an external environment through structured actions such as `<search>`, `<select>`, `<bbox>`, and `<answer>`.
18
+
19
+ This design enables the agent to progressively gather evidence from coarse page-level retrieval to fine-grained region inspection, allowing it to suppress irrelevant content and attend to information-dense regions. This approach is particularly effective for complex reasoning over charts, tables, and multi-page documents.
20
+
21
+ The model was trained using Group Relative Policy Optimization (GRPO) with a dense multi-reward scheme to align agent behavior with multiple objectives without requiring a separate value network.
22
+
23
+ ## Citation
24
+
25
+ ```bibtex
26
+ @misc{unidocrl2026,
27
+ title={UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards},
28
+ author={Jun Wang and Shuo Tan and Zelong Sun and Tiancheng Gu and Yongle Zhao and Ziyong Feng and Kaicheng Yang and Cewu Lu},
29
+ year={2026},
30
+ url={https://huggingface.co/papers/2604.14967}
31
+ }
32
+ ```