PaddleOCR-VL-1.5 Collection Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing • 7 items • Updated Mar 6 • 19
PaddleOCR-VL Collection Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model • 5 items • Updated Feb 11 • 31
google/gemma-4-31B-it Image-Text-to-Text • 33B • Updated about 4 hours ago • 8.59M • • 2.55k
Gemma 4 Collection Gemma 4 is Google's new model family including including E2B, E4B, 26B-A4B, and 31B. • 28 items • Updated 15 days ago • 175
Gemma 3 Collection All versions of Google's new multimodal models including QAT in 1B, 4B, 12B, and 27B sizes. In GGUF, dynamic 4-bit and 16-bit formats. • 54 items • Updated 15 days ago • 115
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning Paper • 2603.23483 • Published Mar 24 • 62
mistralai/Voxtral-Small-24B-2507 Audio-Text-to-Text • 24B • Updated Dec 20, 2025 • 52.5k • 493