DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22, 2025 • 450
Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models Paper • 2605.17672 • Published 4 days ago • 19
Sleep-time Compute: Beyond Inference Scaling at Test-time Paper • 2504.13171 • Published Apr 17, 2025 • 16
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 105
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes Paper • 2306.13649 • Published Jun 23, 2023 • 36
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 503
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook Paper • 2604.02029 • Published Apr 2 • 151
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding Paper • 2501.13200 • Published Jan 22, 2025 • 70
The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems Paper • 2602.15382 • Published Feb 17 • 5
Agent Primitives: Reusable Latent Building Blocks for Multi-Agent Systems Paper • 2602.03695 • Published Feb 3 • 3
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger Paper • 2602.08222 • Published Feb 9 • 290
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning Paper • 2601.09667 • Published Jan 14 • 92