LEONW24
/

BEPA-7B-S2

Image-Text-to-Text

vision-language

reinforcement-learning

text-generation-inference

Model card Files Files and versions

Add model card and metadata

#1

by nielsr HF Staff - opened Jan 12

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +50 -0

README.md ADDED Viewed

	@@ -0,0 +1,50 @@

+---
+license: mit
+library_name: transformers
+pipeline_tag: image-text-to-text
+tags:
+- gui-agent
+- rlvr
+- computer-use
+---
+# BEPA-7B-S2
+This repository contains the weights for **BEPA-7B-S2**, an end-to-end screenshot-to-action policy for GUI agents. The model was introduced in the paper [From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation](https://huggingface.co/papers/2601.05787).
+## Introduction
+**BEPA** (Bi-Level Expert-to-Policy Assimilation) is a framework designed to enhance Vision-Language Models acting as computer-use agents (CUAs). It addresses the challenges of using static expert trajectories in reinforcement learning from verifiable rewards (RLVR) by turning them into policy-aligned guidance.
+BEPA operates in two complementary stages:
+- **LEVEL-1 (Self-Rolled Execution):** Transforms alien expert traces into policy-compatible trajectories by abstracting them into natural-language plans and letting the base policy execute them.
+- **LEVEL-2 (Self-Aligned Assimilation):** Dynamically maintains a per-task cache that injects guided trajectories into training updates when on-policy failures occur.
+On the OSWorld-Verified benchmark, BEPA improves the success rate of UITARS1.5-7B from 22.87% to **32.13%**, establishing it as a top-performing open-source end-to-end model.
+## Resources
+- **Paper:** [https://huggingface.co/papers/2601.05787](https://huggingface.co/papers/2601.05787)
+- **Project Page:** [https://leon-gittech.github.io/Verl_GUI/](https://leon-gittech.github.io/Verl_GUI/)
+- **GitHub Repository:** [https://github.com/LEON-gittech/Verl_GUI](https://github.com/LEON-gittech/Verl_GUI)
+## Main Results
+| Method | Overall Success (%) |
+|--------|-------------|
+| UITARS1.5-7B | 22.87 |
+| GRPO | 23.60 |
+| **BEPA (ours)** | **32.13** |
+## Citation
+```bibtex
+@misc{wang2026offpolicyonpolicyenhancinggui,
+      title={From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation},
+      author={Zezhou Wang and Ziyun Zhang and Xiaoyi Zhang and Zhuzhong Qian and Yan Lu},
+      year={2026},
+      eprint={2601.05787},
+      archivePrefix={arXiv},
+      primaryClass={cs.AI},
+      url={https://arxiv.org/abs/2601.05787},
+}
+```