PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training Paper • 2606.03264 • Published 2 days ago • 11
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift • Apr 2 • 903
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding Paper • 2602.01785 • Published Feb 2 • 97
view article Article LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR lightonai • Oct 23, 2025 • 74
view article Article Building Deep Research: How we Achieved State of the Art Tavily • Nov 24, 2025 • 36
view article Article Supercharge your OCR Pipelines with Open Models +5 merve, ariG23498, davanstrien, hynky, andito, reach-vb, pcuenq • Oct 21, 2025 • 314
LongCodeZip: Compress Long Context for Code Language Models Paper • 2510.00446 • Published Oct 1, 2025 • 108
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published May 14, 2025 • 99
view article Article Vision Language Models (Better, faster, stronger) +3 merve, sergiopaniego, ariG23498, pcuenq, andito • May 12, 2025 • 613
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources Paper • 2504.00595 • Published Apr 1, 2025 • 37
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper • 2503.11576 • Published Mar 14, 2025 • 161
view article Article SigLIP 2: A better multilingual vision language encoder +1 ariG23498, merve, qubvel-hf • Feb 21, 2025 • 216
view article Article SmolVLM2: Bringing Video Understanding to Every Device +5 orrzohar, mfarre, andito, merve, pcuenq, cyrilzakka, Xenova • Feb 20, 2025 • 340