niurl commited on
Commit
cba70d1
·
verified ·
1 Parent(s): 7da86e2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -3
README.md CHANGED
@@ -1,3 +1,37 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - Qwen/Qwen2.5-VL-3B-Instruct
5
+ - Qwen/Qwen2.5-VL-7B-Instruct
6
+ ---
7
+
8
+ <p align="center">
9
+ <h1 align="center"> ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World </h1>
10
+ </p>
11
+
12
+ <p align="center">
13
+ <a href="https://arxiv.org/abs/2505.19095">
14
+ <img src="https://img.shields.io/badge/arXiv-2505.19095-b31b1b.svg" alt="arXiv">
15
+ </a>
16
+ <a href="https://github.com/niuzaisheng/ScreenExplorer">
17
+ <img src="https://img.shields.io/badge/GitHub-ScreenExplorer-blue?logo=github&link=https://github.com/niuzaisheng/ScreenExplorer" alt="GitHub">
18
+ </a>
19
+ </p>
20
+
21
+ We introduce ScreenExplorer, a VLM trained via Group Relative Policy Optimization(GRPO) in real, dynamic, and open-ended GUI environments for diverse exploration. ScreenExplorer is trained to explore and interact with the screen environment, learning to interact effectively with environments based on screenshots and a fixed instruction to encourage exploration.
22
+
23
+ This repo contains the LoRA checkpoints in the training process of `ScreenExplorer-3B-E1` and `ScreenExplorer-7B-E1`. And LoRA checkpoints of `ScreenExplorer-3B-Distill`.
24
+
25
+ ## Citation
26
+
27
+ ```bibtex
28
+ @misc{niu2025screenexplorertrainingvisionlanguagemodel,
29
+ title={ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World},
30
+ author={Runliang Niu and Jinglong Ji and Yi Chang and Qi Wang},
31
+ year={2025},
32
+ eprint={2505.19095},
33
+ archivePrefix={arXiv},
34
+ primaryClass={cs.AI},
35
+ url={https://arxiv.org/abs/2505.19095},
36
+ }
37
+ ```