PaddleOCR-VL-1.5 Collection Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing • 7 items • Updated Mar 6 • 19
PaddleOCR-VL Collection Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model • 5 items • Updated Feb 11 • 31
Gemma 4 Collection Gemma 4 is Google's new model family including including E2B, E4B, 26B-A4B, and 31B. • 28 items • Updated 14 days ago • 175
Gemma 3 Collection All versions of Google's new multimodal models including QAT in 1B, 4B, 12B, and 27B sizes. In GGUF, dynamic 4-bit and 16-bit formats. • 54 items • Updated 14 days ago • 114
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning Paper • 2603.23483 • Published Mar 24 • 62
mistralai/Voxtral-Small-24B-2507 Audio-Text-to-Text • 24B • Updated Dec 20, 2025 • 56.8k • 493