OCR - a shoaibmohd Collection

shoaibmohd 's Collections

Damage detection

Self Supervision

NBA/Recommenders

Computer Use Agent

Learning from examples - training/inference

Data Analysis Papers

OCR

updated Mar 19

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26, 2025 • 161
CommonForms: A Large, Diverse Dataset for Form Field Detection

Paper • 2509.16506 • Published Sep 20, 2025 • 22
Automated Structured Radiology Report Generation with Rich Clinical Context

Paper • 2510.00428 • Published Oct 1, 2025 • 8
Extract-0: A Specialized Language Model for Document Information Extraction

Paper • 2509.22906 • Published Sep 26, 2025
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Paper • 2510.14528 • Published Oct 16, 2025 • 124
RL makes MLLMs see better than SFT

Paper • 2510.16333 • Published Oct 18, 2025 • 49
NVIDIA Nemotron Parse 1.1

Paper • 2511.20478 • Published Nov 25, 2025 • 23
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20, 2025 • 96
Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI

Paper • 2502.17092 • Published Feb 24, 2025 • 3
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 157
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models

Paper • 2601.21639 • Published Jan 29 • 51
DeepSeek-OCR 2: Visual Causal Flow

Paper • 2601.20552 • Published Jan 28 • 69
FireRed-OCR Technical Report

Paper • 2603.01840 • Published Mar 2 • 6
Multimodal OCR: Parse Anything from Documents

Paper • 2603.13032 • Published Mar 13 • 43
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence

Paper • 2603.13398 • Published Mar 11 • 154