Self-Hinting Language Models Enhance Reinforcement Learning Paper • 2602.03143 • Published 10 days ago • 27
RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System Paper • 2602.02488 • Published 11 days ago • 32
Linear representations in language models can change dramatically over a conversation Paper • 2601.20834 • Published 16 days ago • 21
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability Paper • 2601.18778 • Published 18 days ago • 40
Endless Terminals: Scaling RL Environments for Terminal Agents Paper • 2601.16443 • Published 21 days ago • 16
Behavior Knowledge Merge in Reinforced Agentic Models Paper • 2601.13572 • Published 24 days ago • 24
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models Paper • 2601.15165 • Published 23 days ago • 72
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning Paper • 2601.09667 • Published 30 days ago • 90
User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale Paper • 2601.08225 • Published Jan 13 • 52
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking Paper • 2601.06487 • Published Jan 10 • 52
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards Paper • 2601.06021 • Published Jan 9 • 46
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning Paper • 2512.13874 • Published Dec 15, 2025 • 17
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models Paper • 2512.13607 • Published Dec 15, 2025 • 34
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Paper • 2512.15687 • Published Dec 17, 2025 • 20