Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training Paper • 2605.12483 • Published 1 day ago • 7
IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse Paper • 2603.12201 • Published Mar 12 • 53
Not all tokens are needed(NAT): token efficient reinforcement learning Paper • 2603.06619 • Published Feb 20 • 1
Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning Paper • 2602.21420 • Published Feb 24 • 6
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training Paper • 2603.12255 • Published Mar 12 • 91