RubricBench: Aligning Model-Generated Rubrics with Human Standards Paper • 2603.01562 • Published Mar 2 • 64
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Paper • 2602.10604 • Published Feb 11 • 194
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 229
BrainExplore: Large-Scale Discovery of Interpretable Visual Representations in the Human Brain Paper • 2512.08560 • Published Dec 9, 2025 • 39
StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation Paper • 2512.09363 • Published Dec 10, 2025 • 74
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published Nov 12, 2025 • 214
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10, 2025 • 193
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 237
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published Sep 30, 2025 • 550
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing Paper • 2509.08721 • Published Sep 10, 2025 • 664
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published Aug 14, 2025 • 146
GUI-G^2: Gaussian Reward Modeling for GUI Grounding Paper • 2507.15846 • Published Jul 21, 2025 • 135