Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning Paper • 2603.04597 • Published 29 days ago • 210
Understanding R1-Zero-Like Training: A Critical Perspective Paper • 2503.20783 • Published Mar 26, 2025 • 59
Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs Paper • 2509.25779 • Published Sep 30, 2025 • 19
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes Paper • 2306.13649 • Published Jun 23, 2023 • 31
Unfamiliar Finetuning Examples Control How Language Models Hallucinate Paper • 2403.05612 • Published Mar 8, 2024 • 3
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models Paper • 2505.11711 • Published May 16, 2025 • 11
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 237
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper • 2501.03262 • Published Jan 4, 2025 • 104
GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks Paper • 2510.04374 • Published Oct 5, 2025
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? Paper • 2502.12115 • Published Feb 17, 2025 • 46
π_{0.5}: a Vision-Language-Action Model with Open-World Generalization Paper • 2504.16054 • Published Apr 22, 2025 • 4