Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation Paper • 2601.14691 • Published Jan 21 • 1
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published Jan 30 • 111
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 231
Recursive Introspection: Teaching Language Model Agents How to Self-Improve Paper • 2407.18219 • Published Jul 25, 2024 • 3
Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning Paper • 2310.18247 • Published Oct 27, 2023
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published Mar 10, 2025 • 48
Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models Paper • 2310.10639 • Published Oct 16, 2023 • 3
Vision-Language Models Provide Promptable Representations for Reinforcement Learning Paper • 2402.02651 • Published Feb 5, 2024
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL Paper • 2402.19446 • Published Feb 29, 2024
Unfamiliar Finetuning Examples Control How Language Models Hallucinate Paper • 2403.05612 • Published Mar 8, 2024 • 3
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data Paper • 2404.14367 • Published Apr 22, 2024 • 1
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold Paper • 2406.14532 • Published Jun 20, 2024
Recursive Introspection: Teaching Language Model Agents How to Self-Improve Paper • 2407.18219 • Published Jul 25, 2024 • 3
Generative Verifiers: Reward Modeling as Next-Token Prediction Paper • 2408.15240 • Published Aug 27, 2024 • 13
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning Paper • 2410.08146 • Published Oct 10, 2024 • 1
Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance Paper • 2410.13816 • Published Oct 17, 2024 • 1
Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data Paper • 2412.07762 • Published Dec 10, 2024