Improve pipeline tag and add library name; incorporate relevant information from Github README

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +94 -6
README.md CHANGED
@@ -1,13 +1,101 @@
1
  ---
2
- license: mit
3
- language:
4
- - en
5
  base_model:
6
  - Qwen/Qwen2.5-VL-3B-Instruct
7
- pipeline_tag: visual-question-answering
 
 
 
 
8
  ---
9
 
10
-
11
  This repository contains the model presented in [UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning](https://huggingface.co/papers/2503.21620).
12
 
13
- Project page: https://github.com/lll6gg/UI-R1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  base_model:
3
  - Qwen/Qwen2.5-VL-3B-Instruct
4
+ language:
5
+ - en
6
+ license: mit
7
+ pipeline_tag: image-text-to-text
8
+ library_name: transformers
9
  ---
10
 
 
11
  This repository contains the model presented in [UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning](https://huggingface.co/papers/2503.21620).
12
 
13
+ Project page: https://github.com/lll6gg/UI-R1
14
+
15
+ ## Setup
16
+
17
+ ```shell
18
+ conda create -n ui-r1 python=3.10
19
+ conda activate ui-r1
20
+ bash setup.sh
21
+ ```
22
+
23
+ ## Data
24
+
25
+ Our training mobile data is a subset from AndroidControl and ScreenSpot.
26
+
27
+ You can also prepare your training or inference data like:
28
+
29
+ ```
30
+ images/:
31
+ image1.png
32
+ image2.png
33
+ ```
34
+
35
+ ```
36
+ test.json:
37
+ [
38
+ {
39
+ "img_filename": "image1.png",
40
+ "bbox": [
41
+ 825,
42
+ 72,
43
+ 1673,
44
+ 149
45
+ ],
46
+ "instruction": "search bar"
47
+ },
48
+ {
49
+ "img_filename": "image2.png",
50
+ "bbox": [
51
+ 123,
52
+ 732,
53
+ 334,
54
+ 812
55
+ ],
56
+ "instruction": "check weather"
57
+ }
58
+ ]
59
+ ```
60
+
61
+ where bbox : [x1,y1,x2,y2] is the coordinate of the left top and the right bottom of the ground truth bbox
62
+
63
+ ## Inference
64
+
65
+ We provide an example here
66
+
67
+ ```shell
68
+ cd evaluation/
69
+ bash test.sh
70
+ ```
71
+
72
+ Please fill the MODEL_PATH, IMG_PATH, TEST_JSON with your real checkpoint path and data path.
73
+ ## Training
74
+
75
+ ```shell
76
+ cd src/script/
77
+ bash train.sh
78
+ ```
79
+
80
+ ## 🗞️ News
81
+ - **`2025-04-02`**: We release the [datasets](https://huggingface.co/datasets/LZXzju/UI-R1-3B-Train) of the UI-R1-3B model.
82
+ - **`2025-03-30`**: We release the [checkpoints](https://huggingface.co/LZXzju/Qwen2.5-VL-3B-UI-R1) of the UI-R1-3B model.
83
+ - **`2025-03-30`**: We release the UI-R1 repository.
84
+ - **`2025-03-27`**: We release our [paper](https://arxiv.org/abs/2503.21620).
85
+
86
+ ## ⭐️ Citation
87
+
88
+ If you find this project useful, welcome to cite us.
89
+
90
+ ```bit
91
+ @article{lu2025ui,
92
+ title={UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning},
93
+ author={Lu, Zhengxi and Chai, Yuxiang and Guo, Yaxuan and Yin, Xi and Liu, Liang and Wang, Hao and Xiong, Guanjing and Li, Hongsheng},
94
+ journal={arXiv preprint arXiv:2503.21620},
95
+ year={2025}
96
+ }
97
+ ```
98
+
99
+ ## 🤝 Acknowledgements
100
+
101
+ We sincerely thank projects [R1-V](https://github.com/Deep-Agent/R1-V), [Open-R1](https://github.com/huggingface/open-r1), and [Open-r1-multimodal](https://github.com/EvolvingLMMs-Lab/open-r1-multimodal), [VLM-R1](https://github.com/om-ai-lab/VLM-R1) for providing their open-source resources.