Qwen-AgentWorld: Language World Models for General Agents Paper • 2606.24597 • Published 3 days ago • 108
Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation Paper • 2606.18844 • Published 9 days ago • 17
EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions Paper • 2606.23654 • Published 4 days ago • 75
PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 5 days ago • 90
EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory Paper • 2606.21649 • Published 7 days ago • 27
SkillHarness: Harnessing Safe Skills for Computer-Use Agents Paper • 2606.20636 • Published 24 days ago • 19
Notes2Skills: From Lab Notebooks to Certainty-Aware Scientific Agent Skills Paper • 2606.11897 • Published 16 days ago • 11
EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning Paper • 2606.03108 • Published 24 days ago • 11
Toward Generalist Autonomous Research via Hypothesis-Tree Refinement Paper • 2606.11926 • Published 16 days ago • 118
Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks Paper • 2606.12344 • Published 16 days ago • 68
EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments Paper • 2606.13681 • Published 15 days ago • 140
HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness Paper • 2606.12882 • Published 15 days ago • 13
Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short Paper • 2606.09380 • Published 17 days ago • 9