AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning Paper • 2511.19304 • Published Nov 24 • 89
ReCode: Unify Plan and Action for Universal Granularity Control Paper • 2510.23564 • Published Oct 27 • 121
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes Paper • 2306.13649 • Published Jun 23, 2023 • 28
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models Paper • 2508.09138 • Published Aug 12 • 37
Improving LLMs' Generalized Reasoning Abilities by Graph Problems Paper • 2507.17168 • Published Jul 23 • 1
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models Paper • 2508.10751 • Published Aug 14 • 28
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 138