niurl
/

ScreenExplorer

Model card Files Files and versions

niurl commited on Jun 17, 2025

Commit

cba70d1

·

verified ·

1 Parent(s): 7da86e2

Update README.md

Files changed (1) hide show

README.md +37 -3

README.md CHANGED Viewed

@@ -1,3 +1,37 @@
----
-license: mit
----

+---
+license: mit
+base_model:
+- Qwen/Qwen2.5-VL-3B-Instruct
+- Qwen/Qwen2.5-VL-7B-Instruct
+---
+<p align="center">
+<h1 align="center"> ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World </h1>
+</p>
+<p align="center">
+  <a href="https://arxiv.org/abs/2505.19095">
+    <img src="https://img.shields.io/badge/arXiv-2505.19095-b31b1b.svg" alt="arXiv">
+  </a>
+  <a href="https://github.com/niuzaisheng/ScreenExplorer">
+    <img src="https://img.shields.io/badge/GitHub-ScreenExplorer-blue?logo=github&link=https://github.com/niuzaisheng/ScreenExplorer" alt="GitHub">
+  </a>
+</p>
+We introduce ScreenExplorer, a VLM trained via Group Relative Policy Optimization(GRPO) in real, dynamic, and open-ended GUI environments for diverse exploration. ScreenExplorer is trained to explore and interact with the screen environment, learning to interact effectively with environments based on screenshots and a fixed instruction to encourage exploration.
+This repo contains the LoRA checkpoints in the training process of `ScreenExplorer-3B-E1` and `ScreenExplorer-7B-E1`. And LoRA checkpoints of `ScreenExplorer-3B-Distill`.
+## Citation
+```bibtex
+@misc{niu2025screenexplorertrainingvisionlanguagemodel,
+      title={ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World},
+      author={Runliang Niu and Jinglong Ji and Yi Chang and Qi Wang},
+      year={2025},
+      eprint={2505.19095},
+      archivePrefix={arXiv},
+      primaryClass={cs.AI},
+      url={https://arxiv.org/abs/2505.19095},
+}
+```