view article Article LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family 18 days ago • 76
LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR Paper • 2601.14251 • Published 17 days ago • 23
Running on Zero Featured 1.36k Qwen Image Multiple Angles 3D Camera 🎥 1.36k Adjust camera angles in images using 3D controls or sliders
docling-project/SmolDocling-256M-preview Image-Text-to-Text • 0.3B • Updated Sep 17, 2025 • 36.1k • 1.61k
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13, 2025 • 191
view article Article Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H Jun 3, 2025 • 71
view article Article Tiny Agents in Python: a MCP-powered agent in ~70 lines of code +2 May 23, 2025 • 171
Distilling LLM Agent into Small Models with Retrieval and Code Tools Paper • 2505.17612 • Published May 23, 2025 • 81
view article Article TinyAgents: A Minimal Experiment with Code Agents and MCP Tools May 16, 2025 • 30
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch +5 May 21, 2025 • 251