File size: 2,458 Bytes
a1f46dc 91332be 09db769 e13e7a8 09db769 91332be 6a4987d 38405e1 28db656 dc196c8 6a4987d dc196c8 0ae7110 0023127 28db656 91332be | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | # Infinity-Parser2-Pro
<p align="center">
<img src="assets/logo.png" width="400"/>
<p>
<p align="center">
๐ป <a href="https://github.com/infly-ai/INF-MLLM">Github</a> |
๐ <a>Dataset (coming soon...)</a> |
๐ <a>Paper (coming soon...)</a> |
๐ <a>Demo (coming soon...)</a>
</p>
# News
- [2026-04-11] We release Infinity-Parser2-Pro, our flagship document parsing model โ now available as a preview. Stay tuned: the official release, the lightweight Infinity-Parser2-Flash, and our multimodal parsing dataset Infinity-Doc2-10M are coming soon.
# Introduction
We are excited to release Infinity-Parser2-Pro, our latest flagship document understanding model that achieves a new state-of-the-art on olmOCR-Bench with a score of 86.7%, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr. Building on our previous model Infinity-Parser-7B, we have significantly enhanced our data engine and multi-task reinforcement learning approach. This enables the model to consolidate robust multi-modal parsing capabilities into a unified architecture, delivering brand-new zero-shot capabilities for diverse real-world business scenarios.
## Key Features
- **Upgraded Data Engine**: We have comprehensively enhanced our synthetic data engine to support both fixed-layout and flexible-layout document formats. By generating over 1 million diverse full-text samples covering a wide range of document layouts, combined with a dynamic adaptive sampling strategy, we ensure highly balanced and robust multi-task learning across various document types.
- **Multi-Task Reinforcement Learning**: We designed a novel verifiable reward system to support Joint Reinforcement Learning (RL), enabling seamless and simultaneous co-optimization of multiple complex tasks, including doc2json and doc2markdown.
- **Breakthrough Parsing Performance**: It substantially outperforms our previous 7B model, achieving 86.7% on olmOCR-Bench, surpassing frontier models such as DeepSeek-OCR-2, PaddleOCR-VL, and dots.mocr.
- **Inference Acceleration**: By adopting the highly efficient MoE architecture, our inference throughput has increased by 21% (from 441 to 534 tokens/sec), reducing deployment latency and costs.
# Performance
<p align="left">
<img src="assets/document_parsing_performance_evaluation.png" width="1200"/>
<p>
# Quick Start
Coming soon...
# Citation
Coming soon...
# License
This model is licensed under apache-2.0. |