Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published 19 days ago • 32
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development Paper • 2601.11077 • Published 19 days ago • 64
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published 22 days ago • 146
DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset Paper • 2601.10305 • Published 20 days ago • 36
Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering Paper • 2601.10402 • Published 20 days ago • 36
SWE-RM: Execution-free Feedback For Software Engineering Agents Paper • 2512.21919 • Published Dec 26, 2025 • 10
Reinforcement Learning for Self-Improving Agent with Skill Library Paper • 2512.17102 • Published Dec 18, 2025 • 35
Confucius Code Agent: An Open-sourced AI Software Engineer at Industrial Scale Paper • 2512.10398 • Published Dec 11, 2025 • 13
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation Paper • 2512.05033 • Published Dec 4, 2025 • 17
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction Paper • 2512.04987 • Published Dec 4, 2025 • 80
Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning Paper • 2511.19900 • Published Nov 25, 2025 • 48
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning Paper • 2511.16043 • Published Nov 20, 2025 • 109