SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper ⢠2503.11576 ⢠Published Mar 14, 2025 ⢠164
PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training Paper ⢠2606.03264 ⢠Published 27 days ago ⢠23
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper ⢠2509.22186 ⢠Published Sep 26, 2025 ⢠174
Sheet Music Transformer Datasets Collection Datasets for the Sheet Music Transformer ⢠4 items ⢠Updated Sep 23, 2024 ⢠4