ARPO_UITARS1.5_7B / README.md
Fanbin's picture
Create README.md
2c5604d verified
# ARPO_UITARS1.5_7B
**Trained with ARPO (Agentic Replay Policy Optimization) on OSWorld benchmark**
[[Paper]](https://github.com/dvlab-research/ARPO/blob/main/paper.pdf) β€’ [[Code]](https://github.com/dvlab-research/ARPO) β€’ [[Logs]](https://wandb.ai/fanbinlu/arpo)
## Model Summary
`ARPO_UITARS1.5_7B` is fine-tuned from UI-Tars-1.5-7B using **Agentic Replay Policy Optimization (ARPO)** on the **OSWorld** benchmark for GUI agents.
## πŸ“Š Performance
| Model | OSWorld (128 Tasks) | OSWorld Overall |
|-----------------------------|---------------------|-----------------|
| UI-Tars-1.5 | 68.7% | 23.5% |
| UI-Tars-1.5 + GRPO | 72.9% | 26.0% |
| **UI-Tars-1.5 + ARPO (Ours)** | **83.9%** | **29.9%** |
Evaluation setting: max 15 steps per trajectory.
## πŸ“ Citation
If you use this model in your work, please cite:
```bibtex
@article{lu2025arpo,
title={ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay},
author={Fanbin Lu and Zhisheng Zhong and Shu Liu and Chi-Wing Fu and Jiaya Jia},
journal={arxiv},
year={2025}
}
```
---
## πŸ”— Related Resources
- [OSWorld Benchmark](https://github.com/FanbinLu/OSWorld)
- [EasyR1 Framework](https://github.com/hiyouga/EasyR1)
- [Training Logs on W&B](https://wandb.ai/fanbinlu/arpo)