DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution Paper • 2601.13761 • Published 9 days ago • 15
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16, 2025 • 40
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective Paper • 2506.17930 • Published Jun 22, 2025 • 18
ReDit: Reward Dithering for Improved LLM Policy Optimization Paper • 2506.18631 • Published Jun 23, 2025 • 7