Rethinking Memory Mechanisms of Foundation Agents in the Second Half: A Survey Paper • 2602.06052 • Published Jan 14 • 7
view article Article Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL +6 aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, lvwerra, sergiopaniego • May 27 • 42
The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes Paper • 2605.11182 • Published May 11 • 5
Who Prices Cognitive Labor in the Age of Agents? Compute-Anchored Wages Paper • 2605.05558 • Published May 8 • 3
Agentic AI Systems Should Be Designed as Marginal Token Allocators Paper • 2605.01214 • Published May 2 • 4
SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers Paper • 2602.05115 • Published Feb 4 • 20
PaperBanana: Automating Academic Illustration for AI Scientists Paper • 2601.23265 • Published Jan 30 • 229
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published Sep 29, 2025 • 147
OpenTinker: Separating Concerns in Agentic Reinforcement Learning Paper • 2601.07376 • Published Jan 12 • 7
Multi-Agent Evolve: LLM Self-Improve through Co-evolution Paper • 2510.23595 • Published Oct 27, 2025 • 14
Efficient Long-context Language Model Training by Core Attention Disaggregation Paper • 2510.18121 • Published Oct 20, 2025 • 124
Stronger Together: On-Policy Reinforcement Learning for Collaborative LLMs Paper • 2510.11062 • Published Oct 13, 2025 • 29
GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare Paper • 2510.08872 • Published Oct 10, 2025 • 4
Group-in-Group Policy Optimization for LLM Agent Training Paper • 2505.10978 • Published May 16, 2025 • 23