Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts Paper • 2602.13367 • Published Feb 13 • 35
view article Article Building an AI-powered search engine from scratch as-cle-bert • Dec 12, 2024 • 12
Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery Paper • 2601.20088 • Published Jan 27 • 4
view article Article AutoThink: Adaptive Reasoning for Large Language Models codelion • May 27, 2025 • 8
view article Article The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix codelion • Nov 3, 2025 • 65
view article Article The Optimal Architecture for Small Language Models codelion • Dec 26, 2025 • 120
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 775
Awesome SFT datasets Collection A curated list of interesting datasets to fine-tune language models with. • 41 items • Updated Mar 2 • 152
Ministral 3 Collection A collection of edge models, with Base, Instruct and Reasoning variants, in 3 different sizes: 3B, 8B and 14B. All with vision capabilities. • 9 items • Updated Dec 2, 2025 • 167
view article Article SmolLM - blazingly fast and remarkably powerful +1 loubnabnl, anton-l, eliebak • Jul 16, 2024 • 455
Craw4LLM: Efficient Web Crawling for LLM Pretraining Paper • 2502.13347 • Published Feb 19, 2025 • 30
view article Article Releasing Common Corpus: the largest public domain dataset for training LLMs Pclanglais • Mar 20, 2024 • 32
Essential-Web v1.0: 24T tokens of organized web data Paper • 2506.14111 • Published Jun 17, 2025 • 46
view article Article 🥬 TinyLettuce: Efficient Hallucination Detection with 17–68M Encoders adaamko • Aug 31, 2025 • 16