TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning Paper • 2606.32017 • Published 3 days ago • 7
Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training Paper • 2605.12483 • Published May 12 • 10
Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training Paper • 2605.12483 • Published May 12 • 10
Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training Paper • 2605.12483 • Published May 12 • 10
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models Paper • 2604.10866 • Published Apr 13 • 69
SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments Paper • 2604.14144 • Published Apr 15 • 63
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time Paper • 2604.11626 • Published Apr 13 • 103
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published Apr 15 • 168
Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning Paper • 2602.21420 • Published Feb 24 • 6
Not all tokens are needed(NAT): token efficient reinforcement learning Paper • 2603.06619 • Published Feb 20 • 1
Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning Paper • 2602.21420 • Published Feb 24 • 6