Embarrassingly Simple Self-Distillation Improves Code Generation Paper • 2604.01193 • Published Apr 1 • 47
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published Oct 21, 2025 • 116
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 501
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 242
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows Paper • 2512.16969 • Published Dec 18, 2025 • 120
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI Paper • 2512.16676 • Published Dec 18, 2025 • 222
Open Source Work 2025 | Raymond-dev-546730 Collection Open-source LLMs and datasets originally published in 2025 — 55,259 total downloads across models and datasets as of May 4, 2026. • 6 items • Updated 1 day ago • 5
MiniCPM-o & MiniCPM-V Collection Multimodal models with leading performance. • 30 items • Updated 5 days ago • 80
Finetunes | SLMs and LLMs Collection Various variants of LLMs finetuned using proprietary data. • 27 items • Updated Jan 19 • 4
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data +7 Jun 3, 2025 • 346
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch +5 May 21, 2025 • 257
olmOCR Collection olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org • 12 items • Updated Dec 23, 2025 • 150
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4, 2025 • 258