--- license: mit base_model: - Qwen/Qwen2.5-VL-3B-Instruct - Qwen/Qwen2.5-VL-7B-Instruct library_name: peft ---

ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World

arXiv GitHub

We introduce ScreenExplorer, a VLM trained via Group Relative Policy Optimization(GRPO) in real, dynamic, and open-ended GUI environments for diverse exploration. ScreenExplorer is trained to explore and interact with the screen environment, learning to interact effectively with environments based on screenshots and a fixed instruction to encourage exploration. This repo contains the LoRA checkpoints in the training process of `ScreenExplorer-3B-E1` and `ScreenExplorer-7B-E1`. And LoRA checkpoints of `ScreenExplorer-3B-Distill`. ## Citation ```bibtex @misc{niu2025screenexplorertrainingvisionlanguagemodel, title={ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World}, author={Runliang Niu and Jinglong Ji and Yi Chang and Qi Wang}, year={2025}, eprint={2505.19095}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2505.19095}, } ```