Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning Paper • 2606.04923 • Published 1 day ago • 33
Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning Paper • 2602.06600 • Published Feb 6 • 3