BEPA BEPA-7B-S2

From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation

🌐 Project Page | 📑 arXiv Paper | 💻 GitHub

🏆 #1 Open-Source End-to-End Model on OSWorld (15 steps): Achieves 32.13% success rate
📊 Extreme Data Efficiency: Matches GUI-OWL-7B performance using only 128 training tasks

Model Description

BEPA-7B-S2 is a GUI agent model fine-tuned from UI-TARS-1.5-7B using the BEPA (Bi-Level Expert-to-Policy Assimilation) framework. This model achieves state-of-the-art performance among open-source end-to-end models on the OSWorld benchmark.

Key Results

Method Dexpert_only Dtrain Dheld_out Overall (%)
UITARS1.5-7B 18.52 55.12 5.74 22.87
GRPO 11.11 58.02 5.32 23.60
BEPA (ours) 35.19 73.23 10.30 32.13

BEPA improves UI-TARS-1.5-7B from 22.87% to 32.13% on OSWorld-Verified (+9.26 points, +40.5% relative improvement).

BEPA Framework

BEPA Overview

BEPA addresses two key challenges when using expert trajectories for training end-to-end GUI policies:

  1. Structural Mismatch: Framework traces interleave multiple roles (planning, execution, grounding) that end-to-end policies cannot directly imitate.
  2. Distribution Gap: Even after format conversion, trajectories remain far from the base-policy manifold.

LEVEL-1: Self-Rolled Execution

Transforms alien expert traces into policy-compatible trajectories by abstracting expert trajectories into compact natural-language plans, then letting the base policy act in the environment with plan conditioning.

LEVEL-2: Self-Aligned Assimilation

Dynamically maintains a per-task cache, injecting guided trajectories into GRPO updates only upon total on-policy failure. The cache is continuously refreshed with the policy's own successful executions.

Citation

@misc{wang2026offpolicyonpolicyenhancinggui,
      title={From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation},
      author={Zezhou Wang and Ziyun Zhang and Xiaoyi Zhang and Zhuzhong Qian and Yan Lu},
      year={2026},
      eprint={2601.05787},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2601.05787},
}

License

This model is released under the MIT License.

Acknowledgements

Downloads last month
10
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LEONW24/BEPA-7B-S2

Quantizations
1 model

Collection including LEONW24/BEPA-7B-S2

Paper for LEONW24/BEPA-7B-S2