File size: 2,244 Bytes
a1f46dc
91332be
 
 
 
 
 
 
 
 
 
 
 
09db769
 
 
91332be
 
0ae7110
38405e1
28db656
 
0ae7110
 
 
 
 
 
 
 
 
 
28db656
91332be
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# Infinity-Parser2-Pro

<p align="center">
    <img src="assets/logo.png" width="400"/>
<p>

<p align="center">
๐Ÿ’ป <a href="https://github.com/infly-ai/INF-MLLM">Github</a> |
๐Ÿ“Š <a>Dataset (coming soon...)</a> |
๐Ÿ“„ <a>Paper (coming soon...)</a> |
๐Ÿš€ <a>Demo (coming soon...)</a>
</p>

# News
- [2026-04-09] We released our latest flagship document parsing model, Infinity-Parser2-Pro. Note that this is still a preview version. Model weights are being uploaded.

# Introduction

We are excited to release Infinity-Parser2-Pro, our latest flagship document understanding model that achieves a new state-of-the-art on olmOCR-Bench with a score of 86.7%, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL-1.5, and dots.mocr. Building on our previous model Infinity-Parser-7B, we have significantly enhanced our data engine and multi-task reinforcement learning approach. This enables the model to consolidate robust multi-modal parsing capabilities into a unified architecture, delivering brand-new zero-shot capabilities for diverse real-world business scenarios.

## Key Features

- Upgraded Data Engine: We have comprehensively enhanced our synthetic data engine to support both fixed-layout and flexible-layout document formats. By generating over 1 million diverse full-text samples covering a wide range of document layouts, combined with a dynamic adaptive sampling strategy, we ensure highly balanced and robust multi-task learning across various document types.

- Multi-Task Reinforcement Learning: We designed a novel verifiable reward system to support Joint Reinforcement Learning (RL), enabling seamless and simultaneous co-optimization of multiple complex tasks, including doc2json and doc2markdown.

- Breakthrough Parsing Performance: It substantially outperforms our previous 7B model, achieving 86.7% on olmOCR-Bench, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL-1.5, and dots.mocr.

- Inference Acceleration: By adopting the highly efficient MoE architecture, our inference throughput has increased by 21% (from 441 to 534 tokens/sec), reducing deployment latency and costs.

# Performance

Coming soon...

# Citation

Coming soon...

# License

This model is licensed under apache-2.0.