iOSWorld: A Benchmark for Personally Intelligent Phone Agents Paper • 2606.09764 • Published 15 days ago • 3
MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents Paper • 2606.16748 • Published 8 days ago • 6
iOSWorld: A Benchmark for Personally Intelligent Phone Agents Paper • 2606.09764 • Published 15 days ago • 3 • 3
MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents Paper • 2606.16748 • Published 8 days ago • 6 • 3
iOSWorld: A Benchmark for Personally Intelligent Phone Agents Paper • 2606.09764 • Published 15 days ago • 3
iOSWorld: A Benchmark for Personally Intelligent Phone Agents Paper • 2606.09764 • Published 15 days ago • 3
MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents Paper • 2606.16748 • Published 8 days ago • 6
MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents Paper • 2606.16748 • Published 8 days ago • 6
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks Paper • 2410.19100 • Published Oct 24, 2024 • 6
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published Dec 18, 2024 • 51
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published Dec 18, 2024 • 51
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks Paper • 2410.19100 • Published Oct 24, 2024 • 6