# ARPO_UITARS1.5_7B

**Trained with ARPO (Agentic Replay Policy Optimization) on OSWorld benchmark**  
[[Paper]](https://github.com/dvlab-research/ARPO/blob/main/paper.pdf) • [[Code]](https://github.com/dvlab-research/ARPO) • [[Logs]](https://wandb.ai/fanbinlu/arpo)


## Model Summary

`ARPO_UITARS1.5_7B` is fine-tuned from UI-Tars-1.5-7B using **Agentic Replay Policy Optimization (ARPO)** on the **OSWorld** benchmark for GUI agents. 


## 📊 Performance

| Model                        | OSWorld (128 Tasks) | OSWorld Overall |
|-----------------------------|---------------------|-----------------|
| UI-Tars-1.5                 | 68.7%               | 23.5%           |
| UI-Tars-1.5 + GRPO          | 72.9%               | 26.0%           |
| **UI-Tars-1.5 + ARPO (Ours)** | **83.9%**             | **29.9%**         |

Evaluation setting: max 15 steps per trajectory.


## 📝 Citation

If you use this model in your work, please cite:

```bibtex
@article{lu2025arpo,
  title={ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay},
  author={Fanbin Lu and Zhisheng Zhong and Shu Liu and Chi-Wing Fu and Jiaya Jia},
  journal={arxiv},
  year={2025}
}
```

---

## 🔗 Related Resources

- [OSWorld Benchmark](https://github.com/FanbinLu/OSWorld)
- [EasyR1 Framework](https://github.com/hiyouga/EasyR1)
- [Training Logs on W&B](https://wandb.ai/fanbinlu/arpo)