File size: 1,399 Bytes
2c5604d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | # ARPO_UITARS1.5_7B
**Trained with ARPO (Agentic Replay Policy Optimization) on OSWorld benchmark**
[[Paper]](https://github.com/dvlab-research/ARPO/blob/main/paper.pdf) • [[Code]](https://github.com/dvlab-research/ARPO) • [[Logs]](https://wandb.ai/fanbinlu/arpo)
## Model Summary
`ARPO_UITARS1.5_7B` is fine-tuned from UI-Tars-1.5-7B using **Agentic Replay Policy Optimization (ARPO)** on the **OSWorld** benchmark for GUI agents.
## 📊 Performance
| Model | OSWorld (128 Tasks) | OSWorld Overall |
|-----------------------------|---------------------|-----------------|
| UI-Tars-1.5 | 68.7% | 23.5% |
| UI-Tars-1.5 + GRPO | 72.9% | 26.0% |
| **UI-Tars-1.5 + ARPO (Ours)** | **83.9%** | **29.9%** |
Evaluation setting: max 15 steps per trajectory.
## 📝 Citation
If you use this model in your work, please cite:
```bibtex
@article{lu2025arpo,
title={ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay},
author={Fanbin Lu and Zhisheng Zhong and Shu Liu and Chi-Wing Fu and Jiaya Jia},
journal={arxiv},
year={2025}
}
```
---
## 🔗 Related Resources
- [OSWorld Benchmark](https://github.com/FanbinLu/OSWorld)
- [EasyR1 Framework](https://github.com/hiyouga/EasyR1)
- [Training Logs on W&B](https://wandb.ai/fanbinlu/arpo)
|