What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents Paper • 2605.19447 • Published 4 days ago • 2
Beyond Mode Collapse: Distribution Matching for Diverse Reasoning Paper • 2605.19461 • Published 4 days ago • 1
Learning from Language Feedback via Variational Policy Distillation Paper • 2605.15113 • Published 5 days ago • 9
TL-GRPO: Turn-Level RL for Reasoning-Guided Iterative Optimization Paper • 2601.16480 • Published Jan 23 • 50
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Paper • 2503.19990 • Published Mar 25, 2025 • 35