Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 7 days ago • 417
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 22 days ago • 195
Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use Paper • 2605.14038 • Published 21 days ago • 15
AcademiClaw: When Students Set Challenges for AI Agents Paper • 2605.02661 • Published about 1 month ago • 17
DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off Paper • 2604.13902 • Published Apr 15 • 62
Training a Student Expert via Semi-Supervised Foundation Model Distillation Paper • 2604.03841 • Published Apr 4 • 10
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 326
MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models Paper • 2603.28590 • Published Mar 30 • 22
ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers Paper • 2603.24414 • Published Mar 25 • 183
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 352
Pixel-level Scene Understanding in One Token: Visual States Need What-is-Where Composition Paper • 2603.13904 • Published Mar 14 • 4
Does Your Reasoning Model Implicitly Know When to Stop Thinking? Paper • 2602.08354 • Published Feb 9 • 266