Fanbin
/

STEVE-R1-7B-SFT

image-text-to-text

computer-vision

text-generation-inference

Model card Files Files and versions

STEVE-R1-7B-SFT / README.md

nielsr's picture

nielsr HF Staff

Add model card and metadata

1981fcf verified 11 months ago

|

2.3 kB

	---
	pipeline_tag: robotics
	library_name: transformers
	license: apache-2.0
	tags: [robotics, agent, computer-vision, llm]
	---

	# <span style="font-size:30px;">STEVE-R1: Towards Long Reasoning Computer-use Agents</span>

	[![Hugging Face Paper](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Paper-orange)](https://huggingface.co/papers/2503.12532)
	[![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue)](https://huggingface.co/Fanbin/STEVE-R1-7B-SFT)
	[![Hugging Face Data](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Data-green)](https://huggingface.co/datasets/Fanbin/waa_steve_trajectories)


	We evaluate the performance of the STEVE-R1 agent on both in-domain WindowsAgentArena (Windows 11 OS) and out-of-domain OSWorld (Ubuntu OS) benchmarks. The evaluation involves 16 attempts per task, with task completion rates recorded as the primary metric. In the in-domain Windows 11 setting, the STEVE-R1 agent demonstrated a 14% higher task completion rate compared to the previous open-source state-of-the-art model, UI-TARS-7B-DPO. Furthermore, in the out-of-domain Ubuntu OS environment, where STEVE-R1 was not explicitly trained, it still achieved a 7% higher task completion rate than UI-TARS-7B-DPO.

	<div align=center>
	<img width="98%" src="assets/performance.png"/>
	</div>

	## Release
	- Currently only the SFT STEVE-R1 model with step-verified training data is released. RL tunning is in progress.
	- 🔥 An improved version STEVE-R1 is released with long reasoning ability and long image context. We extend the model context length to <b>128K</b> with at most <b>32 screenshot</b> inputs for a single task. The model response length is greatly improved with deepseek-R1 distillation, see the [examples](https://github.com/FanbinLu/STEVE-R1/tree/main/examples). We release the [training data](), [models](https://huggingface.co/Fanbin/STEVE-R1-7B-SFT), and [evaluation trajectories](https://huggingface.co/datasets/Fanbin/waa_steve_trajectories).
	- We release the paper of STEVE: Step Verification Pipeline for Computer-use Agent Training. We propose a single-frame computer-use 7B agent trained with SFT & step-verified KTO.


	## Trajectory Data
	... (rest of the original README content)
	## Citation
	To be added.