DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models Paper • 2503.04472 • Published Jan 12
HiMPO: Hindsight-Informed Memory Policy Optimization for Less-Entangled Credit in Long-Horizon Agents Paper • 2606.16285 • Published 6 days ago • 1
HEAL: Hindsight Entropy-Assisted Learning for Reasoning Distillation Paper • 2603.10359 • Published Mar 11
HiMPO: Hindsight-Informed Memory Policy Optimization for Less-Entangled Credit in Long-Horizon Agents Paper • 2606.16285 • Published 6 days ago • 1