view article Article **ColBERT-Zero: To Pre-train Or Not To Pre-train ColBERT models?** 21 days ago • 18
SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published Feb 2 • 60
Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs Paper • 2601.17058 • Published Jan 22 • 190
GutenOCR: A Grounded Vision-Language Front-End for Documents Paper • 2601.14490 • Published Jan 20 • 37
view article Article LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family Jan 19 • 88
HAI-DEF Concept Apps Collection Collection of concept apps built around HAI-DEF open models/libraries to inspire the community. Learn more at http://goo.gle/hai-def` • 7 items • Updated about 15 hours ago • 49
MedGemma Release Collection Collection of Gemma 3 variants for performance on medical text and image comprehension to accelerate building healthcare-based AI applications. • 9 items • Updated about 15 hours ago • 452
view article Article Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval +1 Mar 22, 2024 • 128
view article Article The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix Nov 3, 2025 • 63
view article Article System Prompt Learning: Teaching LLMs to Learn Problem-Solving Strategies from Experience Jun 2, 2025 • 24
Prompt-MII: Meta-Learning Instruction Induction for LLMs Paper • 2510.16932 • Published Oct 19, 2025 • 7