On Data Engineering for Scaling LLM Terminal Capabilities Paper β’ 2602.21193 β’ Published 17 days ago β’ 94
CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models Paper β’ 2602.17684 β’ Published Feb 4 β’ 22
Rethinking the Trust Region in LLM Reinforcement Learning Paper β’ 2602.04879 β’ Published Feb 4 β’ 37
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition Paper β’ 2307.13269 β’ Published Jul 25, 2023 β’ 34
Diffusion Language Models are Super Data Learners Paper β’ 2511.03276 β’ Published Nov 5, 2025 β’ 129
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper β’ 2509.22638 β’ Published Sep 26, 2025 β’ 70
cwm Collection Collection for Code World Model, an agentic coding model from FAIR. β’ 3 items β’ Updated Sep 24, 2025 β’ 18
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper β’ 2506.20512 β’ Published Jun 25, 2025 β’ 47
Reinforcing General Reasoning without Verifiers Paper β’ 2505.21493 β’ Published May 27, 2025 β’ 26
Fostering Video Reasoning via Next-Event Prediction Paper β’ 2505.22457 β’ Published May 28, 2025 β’ 29