WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 17 days ago • 102
WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 17 days ago • 102
SkillOpt: Executive Strategy for Self-Evolving Agent Skills Paper • 2605.23904 • Published May 22 • 246
From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills Paper • 2605.23899 • Published May 22 • 29
From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills Paper • 2605.23899 • Published May 22 • 29
SkillOpt: Executive Strategy for Self-Evolving Agent Skills Paper • 2605.23904 • Published May 22 • 246 • 4
SkillOpt: Executive Strategy for Self-Evolving Agent Skills Paper • 2605.23904 • Published May 22 • 246 • 4
SkillOpt: Executive Strategy for Self-Evolving Agent Skills Paper • 2605.23904 • Published May 22 • 246
Covering Human Action Space for Computer Use: Data Synthesis and Benchmark Paper • 2605.12501 • Published May 12 • 16
Covering Human Action Space for Computer Use: Data Synthesis and Benchmark Paper • 2605.12501 • Published May 12 • 16
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published Apr 27 • 119
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published Apr 27 • 119
AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation Paper • 2604.08540 • Published Apr 9 • 5