MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26, 2025 • 141
CommonForms: A Large, Diverse Dataset for Form Field Detection Paper • 2509.16506 • Published Sep 20, 2025 • 22
Automated Structured Radiology Report Generation with Rich Clinical Context Paper • 2510.00428 • Published Oct 1, 2025 • 8
Extract-0: A Specialized Language Model for Document Information Extraction Paper • 2509.22906 • Published Sep 26, 2025
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Paper • 2510.14528 • Published Oct 16, 2025 • 112
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published Nov 20, 2025 • 93
Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI Paper • 2502.17092 • Published Feb 24, 2025 • 3
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper • 2503.11576 • Published Mar 14, 2025 • 137