view article Article ChatGPT 背后的“功臣”——RLHF 技术详解 +2 natolambert, LouisCastricato, lvwerra, Dahoas • Dec 9, 2022 • 13
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 25
view article Article 来自OpenAI gpt-oss的技巧,你🫵在transformers中也可以使用 +5 ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez • Sep 11, 2025 • 14
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge NormalUhr • Feb 7, 2025 • 293
view article Article The Optimal Architecture for Small Language Models codelion • Dec 26, 2025 • 121
view article Article nanoVLM: 最简洁、最轻量的纯 PyTorch 视觉-语言模型训练代码库 +5 ariG23498, lusxvr, andito, sergiopaniego, merve, pcuenq, reach-vb • May 21, 2025 • 29
view article Article 流式数据集:效率提升 100 倍 +3 andito, lhoestq, burtenshaw, pcuenq, merve • Oct 27, 2025 • 7