Sample-Efficient Post-Training for LEGO Spatial-Physics Reasoning Paper • 2606.07602 • Published 29 days ago • 6
Sample-Efficient Post-Training for LEGO Spatial-Physics Reasoning Paper • 2606.07602 • Published 29 days ago • 6 • 1
Sample-Efficient Post-Training for LEGO Spatial-Physics Reasoning Paper • 2606.07602 • Published 29 days ago • 6
Stabilizing Rubric Integration Training via Decoupled Advantage Normalization Paper • 2603.26535 • Published Mar 27 • 3
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation Paper • 2602.24286 • Published Feb 27 • 99
Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL Paper • 2602.03773 • Published Feb 3 • 14