license: mit
library_name: transformers
pipeline_tag: image-text-to-text
tags:
- gui-agent
- rlvr
- computer-use
BEPA-7B-S2
This repository contains the weights for BEPA-7B-S2, an end-to-end screenshot-to-action policy for GUI agents. The model was introduced in the paper From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation.
Introduction
BEPA (Bi-Level Expert-to-Policy Assimilation) is a framework designed to enhance Vision-Language Models acting as computer-use agents (CUAs). It addresses the challenges of using static expert trajectories in reinforcement learning from verifiable rewards (RLVR) by turning them into policy-aligned guidance.
BEPA operates in two complementary stages:
- LEVEL-1 (Self-Rolled Execution): Transforms alien expert traces into policy-compatible trajectories by abstracting them into natural-language plans and letting the base policy execute them.
- LEVEL-2 (Self-Aligned Assimilation): Dynamically maintains a per-task cache that injects guided trajectories into training updates when on-policy failures occur.
On the OSWorld-Verified benchmark, BEPA improves the success rate of UITARS1.5-7B from 22.87% to 32.13%, establishing it as a top-performing open-source end-to-end model.
Resources
- Paper: https://huggingface.co/papers/2601.05787
- Project Page: https://leon-gittech.github.io/Verl_GUI/
- GitHub Repository: https://github.com/LEON-gittech/Verl_GUI
Main Results
| Method | Overall Success (%) |
|---|---|
| UITARS1.5-7B | 22.87 |
| GRPO | 23.60 |
| BEPA (ours) | 32.13 |
Citation
@misc{wang2026offpolicyonpolicyenhancinggui,
title={From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation},
author={Zezhou Wang and Ziyun Zhang and Xiaoyi Zhang and Zhuzhong Qian and Yan Lu},
year={2026},
eprint={2601.05787},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2601.05787},
}