# ARPO_UITARS1.5_7B **Trained with ARPO (Agentic Replay Policy Optimization) on OSWorld benchmark** [[Paper]](https://github.com/dvlab-research/ARPO/blob/main/paper.pdf) • [[Code]](https://github.com/dvlab-research/ARPO) • [[Logs]](https://wandb.ai/fanbinlu/arpo) ## Model Summary `ARPO_UITARS1.5_7B` is fine-tuned from UI-Tars-1.5-7B using **Agentic Replay Policy Optimization (ARPO)** on the **OSWorld** benchmark for GUI agents. ## 📊 Performance | Model | OSWorld (128 Tasks) | OSWorld Overall | |-----------------------------|---------------------|-----------------| | UI-Tars-1.5 | 68.7% | 23.5% | | UI-Tars-1.5 + GRPO | 72.9% | 26.0% | | **UI-Tars-1.5 + ARPO (Ours)** | **83.9%** | **29.9%** | Evaluation setting: max 15 steps per trajectory. ## 📝 Citation If you use this model in your work, please cite: ```bibtex @article{lu2025arpo, title={ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay}, author={Fanbin Lu and Zhisheng Zhong and Shu Liu and Chi-Wing Fu and Jiaya Jia}, journal={arxiv}, year={2025} } ``` --- ## 🔗 Related Resources - [OSWorld Benchmark](https://github.com/FanbinLu/OSWorld) - [EasyR1 Framework](https://github.com/hiyouga/EasyR1) - [Training Logs on W&B](https://wandb.ai/fanbinlu/arpo)