ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research Paper • 2606.07591 • Published May 28 • 99
MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation Paper • 2606.02470 • Published Jun 1 • 16
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories Paper • 2605.04036 • Published May 5 • 72
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 329
FedDisco: Federated Learning with Discrepancy-Aware Collaboration Paper • 2305.19229 • Published May 30, 2023
Fake It Till Make It: Federated Learning with Consensus-Oriented Generation Paper • 2312.05966 • Published Dec 10, 2023
Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation Paper • 2402.05699 • Published Feb 8, 2024 • 2
MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems Paper • 2503.03686 • Published Mar 5, 2025 • 1
FedMABench: Benchmarking Mobile Agents on Decentralized Heterogeneous User Data Paper • 2503.05143 • Published Mar 7, 2025 • 1
MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems Paper • 2505.16988 • Published May 22, 2025
SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam? Paper • 2507.05241 • Published Jul 7, 2025 • 5
BrowseMaster: Towards Scalable Web Browsing via Tool-Augmented Programmatic Agent Pair Paper • 2508.09129 • Published Aug 12, 2025 • 2
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning Paper • 2509.13305 • Published Sep 16, 2025 • 93
AgentFold: Long-Horizon Web Agents with Proactive Context Management Paper • 2510.24699 • Published Oct 28, 2025 • 73
WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking Paper • 2510.24697 • Published Oct 28, 2025 • 22