Fanbin
/

ARPO_UITARS1.5_7B

Model card Files Files and versions

ARPO_UITARS1.5_7B / README.md

Fanbin's picture

Create README.md

2c5604d verified 9 months ago

|

history blame contribute delete

1.4 kB

	# ARPO_UITARS1.5_7B

	Trained with ARPO (Agentic Replay Policy Optimization) on OSWorld benchmark
	[[Paper]](https://github.com/dvlab-research/ARPO/blob/main/paper.pdf) • [[Code]](https://github.com/dvlab-research/ARPO) • [[Logs]](https://wandb.ai/fanbinlu/arpo)


	## Model Summary

	`ARPO_UITARS1.5_7B` is fine-tuned from UI-Tars-1.5-7B using Agentic Replay Policy Optimization (ARPO) on the OSWorld benchmark for GUI agents.


	## 📊 Performance

	\| Model \| OSWorld (128 Tasks) \| OSWorld Overall \|
	\|-----------------------------\|---------------------\|-----------------\|
	\| UI-Tars-1.5 \| 68.7% \| 23.5% \|
	\| UI-Tars-1.5 + GRPO \| 72.9% \| 26.0% \|
	\| UI-Tars-1.5 + ARPO (Ours) \| 83.9% \| 29.9% \|

	Evaluation setting: max 15 steps per trajectory.




	## 📝 Citation

	If you use this model in your work, please cite:

	```bibtex
	@article{lu2025arpo,
	title={ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay},
	author={Fanbin Lu and Zhisheng Zhong and Shu Liu and Chi-Wing Fu and Jiaya Jia},
	journal={arxiv},
	year={2025}
	}
	```

	---

	## 🔗 Related Resources

	- [OSWorld Benchmark](https://github.com/FanbinLu/OSWorld)
	- [EasyR1 Framework](https://github.com/hiyouga/EasyR1)
	- [Training Logs on W&B](https://wandb.ai/fanbinlu/arpo)