BEPA-7B-S2 / README.md

nielsr HF Staff

Add model card and metadata

9d8b150 verified 4 months ago

2.24 kB

license: mit
library_name: transformers
pipeline_tag: image-text-to-text
tags:
  - gui-agent
  - rlvr
  - computer-use

BEPA-7B-S2

This repository contains the weights for BEPA-7B-S2, an end-to-end screenshot-to-action policy for GUI agents. The model was introduced in the paper From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation.

Introduction

BEPA (Bi-Level Expert-to-Policy Assimilation) is a framework designed to enhance Vision-Language Models acting as computer-use agents (CUAs). It addresses the challenges of using static expert trajectories in reinforcement learning from verifiable rewards (RLVR) by turning them into policy-aligned guidance.

BEPA operates in two complementary stages:

LEVEL-1 (Self-Rolled Execution): Transforms alien expert traces into policy-compatible trajectories by abstracting them into natural-language plans and letting the base policy execute them.
LEVEL-2 (Self-Aligned Assimilation): Dynamically maintains a per-task cache that injects guided trajectories into training updates when on-policy failures occur.

On the OSWorld-Verified benchmark, BEPA improves the success rate of UITARS1.5-7B from 22.87% to 32.13%, establishing it as a top-performing open-source end-to-end model.

Resources

Paper: https://huggingface.co/papers/2601.05787
Project Page: https://leon-gittech.github.io/Verl_GUI/
GitHub Repository: https://github.com/LEON-gittech/Verl_GUI

Main Results

Method	Overall Success (%)
UITARS1.5-7B	22.87
GRPO	23.60
BEPA (ours)	32.13

Citation

@misc{wang2026offpolicyonpolicyenhancinggui,
      title={From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation}, 
      author={Zezhou Wang and Ziyun Zhang and Xiaoyi Zhang and Zhuzhong Qian and Yan Lu},
      year={2026},
      eprint={2601.05787},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2601.05787}, 
}