iOSWorld: A Benchmark for Personally Intelligent Phone Agents Paper • 2606.09764 • Published 17 days ago • 3 • 3
MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents Paper • 2606.16748 • Published 10 days ago • 6 • 3
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks Paper • 2410.19100 • Published Oct 24, 2024 • 6 • 2