view article Article cua-bench: A Framework for Benchmarking, Training Data, and RL Environments for Computer-Use Agents 13 days ago • 9
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Paper • 2409.08264 • Published Sep 12, 2024 • 48