Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published May 12 • 196
You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories Paper • 2605.21468 • Published May 20 • 51
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining Paper • 2605.14747 • Published May 14 • 147
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Paper • 2605.09063 • Published May 9 • 82
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation Paper • 2602.12125 • Published Feb 12 • 68
Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments Paper • 2602.11964 • Published Feb 12 • 13
Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR Paper • 2602.05261 • Published Feb 5 • 54
Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published Jan 29 • 105
On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning Paper • 2505.17508 • Published May 23, 2025 • 8
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance Paper • 2511.13254 • Published Nov 17, 2025 • 140
ChartM^3: A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension Paper • 2511.02415 • Published Nov 4, 2025 • 5
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 517
Quantile Advantage Estimation for Entropy-Safe Reasoning Paper • 2509.22611 • Published Sep 26, 2025 • 119
Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images Paper • 2509.07966 • Published Sep 9, 2025 • 5