| # ARPO_UITARS1.5_7B | |
| **Trained with ARPO (Agentic Replay Policy Optimization) on OSWorld benchmark** | |
| [[Paper]](https://github.com/dvlab-research/ARPO/blob/main/paper.pdf) β’ [[Code]](https://github.com/dvlab-research/ARPO) β’ [[Logs]](https://wandb.ai/fanbinlu/arpo) | |
| ## Model Summary | |
| `ARPO_UITARS1.5_7B` is fine-tuned from UI-Tars-1.5-7B using **Agentic Replay Policy Optimization (ARPO)** on the **OSWorld** benchmark for GUI agents. | |
| ## π Performance | |
| | Model | OSWorld (128 Tasks) | OSWorld Overall | | |
| |-----------------------------|---------------------|-----------------| | |
| | UI-Tars-1.5 | 68.7% | 23.5% | | |
| | UI-Tars-1.5 + GRPO | 72.9% | 26.0% | | |
| | **UI-Tars-1.5 + ARPO (Ours)** | **83.9%** | **29.9%** | | |
| Evaluation setting: max 15 steps per trajectory. | |
| ## π Citation | |
| If you use this model in your work, please cite: | |
| ```bibtex | |
| @article{lu2025arpo, | |
| title={ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay}, | |
| author={Fanbin Lu and Zhisheng Zhong and Shu Liu and Chi-Wing Fu and Jiaya Jia}, | |
| journal={arxiv}, | |
| year={2025} | |
| } | |
| ``` | |
| --- | |
| ## π Related Resources | |
| - [OSWorld Benchmark](https://github.com/FanbinLu/OSWorld) | |
| - [EasyR1 Framework](https://github.com/hiyouga/EasyR1) | |
| - [Training Logs on W&B](https://wandb.ai/fanbinlu/arpo) | |