WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation Paper • 2605.25874 • Published about 1 month ago • 103
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published May 12 • 194
OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis Paper • 2604.15093 • Published Apr 16 • 30
OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis Paper • 2604.15093 • Published Apr 16 • 30
OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis Paper • 2604.15093 • Published Apr 16 • 30
Running Agents 1 Mobile Agent Trajectory Viewer 📱 1 Explore mobile app interaction trajectories with visual UI
On Data Engineering for Scaling LLM Terminal Capabilities Paper • 2602.21193 • Published Feb 24 • 103
OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions Paper • 2602.05843 • Published Feb 5 • 61
TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents Paper • 2602.02196 • Published Feb 2 • 35
OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent Paper • 2601.07779 • Published Jan 12 • 28