Autodata: An agentic data scientist to create high quality synthetic data Paper • 2606.25996 • Published 2 days ago • 8
NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers? Paper • 2606.24530 • Published 3 days ago • 55
Qwen-AgentWorld: Language World Models for General Agents Paper • 2606.24597 • Published 3 days ago • 112
Deep Research in Physical Sciences: A Multi-Agent Framework and Comprehensive Benchmark Paper • 2606.18648 • Published 9 days ago • 14
CLI-Universe: Towards Verifiable Task Synthesis Engine for Terminal Agents Paper • 2606.22883 • Published 4 days ago • 31
ENPIRE: Agentic Robot Policy Self-Improvement in the Real World Paper • 2606.19980 • Published 8 days ago • 14
iOSWorld: A Benchmark for Personally Intelligent Phone Agents Paper • 2606.09764 • Published 18 days ago • 3
MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents Paper • 2606.16748 • Published 11 days ago • 6
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients Paper • 2606.18216 • Published 10 days ago • 61
HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry Paper • 2606.14249 • Published 14 days ago • 47
EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery Paper • 2606.13662 • Published 15 days ago • 27
EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments Paper • 2606.13681 • Published 15 days ago • 140
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning Paper • 2606.13673 • Published 15 days ago • 106
Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields Paper • 2606.11042 • Published 17 days ago • 21
SWE-Explore: Benchmarking How Coding Agents Explore Repositories Paper • 2606.07297 • Published 21 days ago • 119