--- pipeline_tag: robotics library_name: transformers license: apache-2.0 tags: [robotics, agent, computer-vision, llm] --- # STEVE-R1: Towards Long Reasoning Computer-use Agents [](https://huggingface.co/papers/2503.12532) [](https://huggingface.co/Fanbin/STEVE-R1-7B-SFT) [](https://huggingface.co/datasets/Fanbin/waa_steve_trajectories) We evaluate the performance of the **STEVE-R1 agent** on both in-domain WindowsAgentArena (Windows 11 OS) and out-of-domain OSWorld (Ubuntu OS) benchmarks. The evaluation involves 16 attempts per task, with task completion rates recorded as the primary metric. In the in-domain Windows 11 setting, the STEVE-R1 agent demonstrated a **14%** higher task completion rate compared to the previous open-source state-of-the-art model, UI-TARS-7B-DPO. Furthermore, in the out-of-domain Ubuntu OS environment, where STEVE-R1 was not explicitly trained, it still achieved a **7%** higher task completion rate than UI-TARS-7B-DPO.