Flipping the Dialogue: Training and Evaluating User Language Models Paper • 2510.06552 • Published Oct 8, 2025 • 2
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards Paper • 2505.24760 • Published May 30, 2025 • 74
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic Paper • 2509.01363 • Published Sep 1, 2025 • 62
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR Paper • 2509.02522 • Published Sep 2, 2025 • 26
SmolLM3 pretraining datasets Collection datasets used in SmolLM3 pretraining • 15 items • Updated Aug 12, 2025 • 53
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 780
Expanding RL with Verifiable Rewards Across Diverse Domains Paper • 2503.23829 • Published Mar 31, 2025 • 24
Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning Paper • 2505.13866 • Published May 20, 2025 • 17
General-Reasoner: Advancing LLM Reasoning Across All Domains Paper • 2505.14652 • Published May 20, 2025 • 24