InnoGym: Benchmarking the Innovation Potential of AI Agents Paper • 2512.01822 • Published Dec 1, 2025 • 36
AgentFold: Long-Horizon Web Agents with Proactive Context Management Paper • 2510.24699 • Published Oct 28, 2025 • 71
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published Oct 21, 2025 • 113
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published Sep 30, 2025 • 55
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses Paper • 2510.00232 • Published Sep 30, 2025 • 16
OceanGym: A Benchmark Environment for Underwater Embodied Agents Paper • 2509.26536 • Published Sep 30, 2025 • 36
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training Paper • 2509.23661 • Published Sep 28, 2025 • 48
Towards Personalized Deep Research: Benchmarks and Evaluations Paper • 2509.25106 • Published Sep 29, 2025 • 30
Towards General Agentic Intelligence via Environment Scaling Paper • 2509.13311 • Published Sep 16, 2025 • 71
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents Paper • 2509.13309 • Published Sep 16, 2025 • 67
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization Paper • 2509.13313 • Published Sep 16, 2025 • 80
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning Paper • 2509.13305 • Published Sep 16, 2025 • 91
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research Paper • 2509.13312 • Published Sep 16, 2025 • 105
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent Paper • 2508.05748 • Published Aug 7, 2025 • 141