Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning Paper • 2606.04923 • Published 16 days ago • 39
STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability Paper • 2606.19236 • Published 2 days ago • 8
Meta-CoT: Enhancing Granularity and Generalization in Image Editing Paper • 2604.24625 • Published Apr 27 • 26