nielsr HF Staff commited on
Commit
564ed6b
·
verified ·
1 Parent(s): 2b1dd70

Improve pipeline tag and add library name; incorporate relevant information from Github README

Browse files

This PR corrects the `pipeline_tag` to `image-text-to-text` which accurately reflects the model's functionality as described in the paper [UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning](https://huggingface.co/papers/2503.21620). It also adds the `library_name` as `transformers` given the model's compatibility with the Transformers library. This PR also copies the "Setup", "Data", "Inference", "Training", "News", "Citation" and "Acknowledgements" sections of the github README into the content of the model card.

Files changed (1) hide show
  1. README.md +94 -6
README.md CHANGED
@@ -1,13 +1,101 @@
1
  ---
2
- license: mit
3
- language:
4
- - en
5
  base_model:
6
  - Qwen/Qwen2.5-VL-3B-Instruct
7
- pipeline_tag: visual-question-answering
 
 
 
 
8
  ---
9
 
10
-
11
  This repository contains the model presented in [UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning](https://huggingface.co/papers/2503.21620).
12
 
13
- Project page: https://github.com/lll6gg/UI-R1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  base_model:
3
  - Qwen/Qwen2.5-VL-3B-Instruct
4
+ language:
5
+ - en
6
+ license: mit
7
+ pipeline_tag: image-text-to-text
8
+ library_name: transformers
9
  ---
10
 
 
11
  This repository contains the model presented in [UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning](https://huggingface.co/papers/2503.21620).
12
 
13
+ Project page: https://github.com/lll6gg/UI-R1
14
+
15
+ ## Setup
16
+
17
+ ```shell
18
+ conda create -n ui-r1 python=3.10
19
+ conda activate ui-r1
20
+ bash setup.sh
21
+ ```
22
+
23
+ ## Data
24
+
25
+ Our training mobile data is a subset from AndroidControl and ScreenSpot.
26
+
27
+ You can also prepare your training or inference data like:
28
+
29
+ ```
30
+ images/:
31
+ image1.png
32
+ image2.png
33
+ ```
34
+
35
+ ```
36
+ test.json:
37
+ [
38
+ {
39
+ "img_filename": "image1.png",
40
+ "bbox": [
41
+ 825,
42
+ 72,
43
+ 1673,
44
+ 149
45
+ ],
46
+ "instruction": "search bar"
47
+ },
48
+ {
49
+ "img_filename": "image2.png",
50
+ "bbox": [
51
+ 123,
52
+ 732,
53
+ 334,
54
+ 812
55
+ ],
56
+ "instruction": "check weather"
57
+ }
58
+ ]
59
+ ```
60
+
61
+ where bbox : [x1,y1,x2,y2] is the coordinate of the left top and the right bottom of the ground truth bbox
62
+
63
+ ## Inference
64
+
65
+ We provide an example here
66
+
67
+ ```shell
68
+ cd evaluation/
69
+ bash test.sh
70
+ ```
71
+
72
+ Please fill the MODEL_PATH, IMG_PATH, TEST_JSON with your real checkpoint path and data path.
73
+ ## Training
74
+
75
+ ```shell
76
+ cd src/script/
77
+ bash train.sh
78
+ ```
79
+
80
+ ## 🗞️ News
81
+ - **`2025-04-02`**: We release the [datasets](https://huggingface.co/datasets/LZXzju/UI-R1-3B-Train) of the UI-R1-3B model.
82
+ - **`2025-03-30`**: We release the [checkpoints](https://huggingface.co/LZXzju/Qwen2.5-VL-3B-UI-R1) of the UI-R1-3B model.
83
+ - **`2025-03-30`**: We release the UI-R1 repository.
84
+ - **`2025-03-27`**: We release our [paper](https://arxiv.org/abs/2503.21620).
85
+
86
+ ## ⭐️ Citation
87
+
88
+ If you find this project useful, welcome to cite us.
89
+
90
+ ```bit
91
+ @article{lu2025ui,
92
+ title={UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning},
93
+ author={Lu, Zhengxi and Chai, Yuxiang and Guo, Yaxuan and Yin, Xi and Liu, Liang and Wang, Hao and Xiong, Guanjing and Li, Hongsheng},
94
+ journal={arXiv preprint arXiv:2503.21620},
95
+ year={2025}
96
+ }
97
+ ```
98
+
99
+ ## 🤝 Acknowledgements
100
+
101
+ We sincerely thank projects [R1-V](https://github.com/Deep-Agent/R1-V), [Open-R1](https://github.com/huggingface/open-r1), and [Open-r1-multimodal](https://github.com/EvolvingLMMs-Lab/open-r1-multimodal), [VLM-R1](https://github.com/om-ai-lab/VLM-R1) for providing their open-source resources.