BEPA-7B-S2 / README.md
nielsr's picture
nielsr HF Staff
Add model card and metadata
9d8b150 verified
|
raw
history blame
2.24 kB
metadata
license: mit
library_name: transformers
pipeline_tag: image-text-to-text
tags:
  - gui-agent
  - rlvr
  - computer-use

BEPA-7B-S2

This repository contains the weights for BEPA-7B-S2, an end-to-end screenshot-to-action policy for GUI agents. The model was introduced in the paper From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation.

Introduction

BEPA (Bi-Level Expert-to-Policy Assimilation) is a framework designed to enhance Vision-Language Models acting as computer-use agents (CUAs). It addresses the challenges of using static expert trajectories in reinforcement learning from verifiable rewards (RLVR) by turning them into policy-aligned guidance.

BEPA operates in two complementary stages:

  • LEVEL-1 (Self-Rolled Execution): Transforms alien expert traces into policy-compatible trajectories by abstracting them into natural-language plans and letting the base policy execute them.
  • LEVEL-2 (Self-Aligned Assimilation): Dynamically maintains a per-task cache that injects guided trajectories into training updates when on-policy failures occur.

On the OSWorld-Verified benchmark, BEPA improves the success rate of UITARS1.5-7B from 22.87% to 32.13%, establishing it as a top-performing open-source end-to-end model.

Resources

Main Results

Method Overall Success (%)
UITARS1.5-7B 22.87
GRPO 23.60
BEPA (ours) 32.13

Citation

@misc{wang2026offpolicyonpolicyenhancinggui,
      title={From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation}, 
      author={Zezhou Wang and Ziyun Zhang and Xiaoyi Zhang and Zhuzhong Qian and Yan Lu},
      year={2026},
      eprint={2601.05787},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2601.05787}, 
}