GRPO, Dr. GRPO, and DAPO Are Three Operations on One Number: The Group-Standard-Deviation Identity Paper • 2607.00152 • Published 4 days ago • 3
Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems Paper • 2606.00090 • Published May 23 • 6
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published May 27 • 431
Leveraging Verifier-Based Reinforcement Learning in Image Editing Paper • 2604.27505 • Published Apr 30 • 59
Offline Evaluation Measures of Fairness in Recommender Systems Paper • 2604.25032 • Published Apr 27 • 2
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time Paper • 2604.11626 • Published Apr 13 • 103
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 509
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 638
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 344
PRBench: End-to-end Paper Reproduction in Physics Research Paper • 2603.27646 • Published Mar 29 • 29
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 353
Lost in Stories: Consistency Bugs in Long Story Generation by LLMs Paper • 2603.05890 • Published Mar 6 • 93
The Trinity of Consistency as a Defining Principle for General World Models Paper • 2602.23152 • Published Feb 26 • 203
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training Paper • 2602.10693 • Published Feb 11 • 221
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise Paper • 2602.12783 • Published Feb 13 • 246
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs Paper • 2602.10388 • Published Feb 11 • 245