Improve pipeline tag and add library name; incorporate relevant information from Github README
Browse filesThis PR corrects the `pipeline_tag` to `image-text-to-text` which accurately reflects the model's functionality as described in the paper [UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning](https://huggingface.co/papers/2503.21620). It also adds the `library_name` as `transformers` given the model's compatibility with the Transformers library. This PR also copies the "Setup", "Data", "Inference", "Training", "News", "Citation" and "Acknowledgements" sections of the github README into the content of the model card.
README.md
CHANGED
|
@@ -1,13 +1,101 @@
|
|
| 1 |
---
|
| 2 |
-
license: mit
|
| 3 |
-
language:
|
| 4 |
-
- en
|
| 5 |
base_model:
|
| 6 |
- Qwen/Qwen2.5-VL-3B-Instruct
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
| 11 |
This repository contains the model presented in [UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning](https://huggingface.co/papers/2503.21620).
|
| 12 |
|
| 13 |
-
Project page: https://github.com/lll6gg/UI-R1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- Qwen/Qwen2.5-VL-3B-Instruct
|
| 4 |
+
language:
|
| 5 |
+
- en
|
| 6 |
+
license: mit
|
| 7 |
+
pipeline_tag: image-text-to-text
|
| 8 |
+
library_name: transformers
|
| 9 |
---
|
| 10 |
|
|
|
|
| 11 |
This repository contains the model presented in [UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning](https://huggingface.co/papers/2503.21620).
|
| 12 |
|
| 13 |
+
Project page: https://github.com/lll6gg/UI-R1
|
| 14 |
+
|
| 15 |
+
## Setup
|
| 16 |
+
|
| 17 |
+
```shell
|
| 18 |
+
conda create -n ui-r1 python=3.10
|
| 19 |
+
conda activate ui-r1
|
| 20 |
+
bash setup.sh
|
| 21 |
+
```
|
| 22 |
+
|
| 23 |
+
## Data
|
| 24 |
+
|
| 25 |
+
Our training mobile data is a subset from AndroidControl and ScreenSpot.
|
| 26 |
+
|
| 27 |
+
You can also prepare your training or inference data like:
|
| 28 |
+
|
| 29 |
+
```
|
| 30 |
+
images/:
|
| 31 |
+
image1.png
|
| 32 |
+
image2.png
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
```
|
| 36 |
+
test.json:
|
| 37 |
+
[
|
| 38 |
+
{
|
| 39 |
+
"img_filename": "image1.png",
|
| 40 |
+
"bbox": [
|
| 41 |
+
825,
|
| 42 |
+
72,
|
| 43 |
+
1673,
|
| 44 |
+
149
|
| 45 |
+
],
|
| 46 |
+
"instruction": "search bar"
|
| 47 |
+
},
|
| 48 |
+
{
|
| 49 |
+
"img_filename": "image2.png",
|
| 50 |
+
"bbox": [
|
| 51 |
+
123,
|
| 52 |
+
732,
|
| 53 |
+
334,
|
| 54 |
+
812
|
| 55 |
+
],
|
| 56 |
+
"instruction": "check weather"
|
| 57 |
+
}
|
| 58 |
+
]
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
where bbox : [x1,y1,x2,y2] is the coordinate of the left top and the right bottom of the ground truth bbox
|
| 62 |
+
|
| 63 |
+
## Inference
|
| 64 |
+
|
| 65 |
+
We provide an example here
|
| 66 |
+
|
| 67 |
+
```shell
|
| 68 |
+
cd evaluation/
|
| 69 |
+
bash test.sh
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
Please fill the MODEL_PATH, IMG_PATH, TEST_JSON with your real checkpoint path and data path.
|
| 73 |
+
## Training
|
| 74 |
+
|
| 75 |
+
```shell
|
| 76 |
+
cd src/script/
|
| 77 |
+
bash train.sh
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
## 🗞️ News
|
| 81 |
+
- **`2025-04-02`**: We release the [datasets](https://huggingface.co/datasets/LZXzju/UI-R1-3B-Train) of the UI-R1-3B model.
|
| 82 |
+
- **`2025-03-30`**: We release the [checkpoints](https://huggingface.co/LZXzju/Qwen2.5-VL-3B-UI-R1) of the UI-R1-3B model.
|
| 83 |
+
- **`2025-03-30`**: We release the UI-R1 repository.
|
| 84 |
+
- **`2025-03-27`**: We release our [paper](https://arxiv.org/abs/2503.21620).
|
| 85 |
+
|
| 86 |
+
## ⭐️ Citation
|
| 87 |
+
|
| 88 |
+
If you find this project useful, welcome to cite us.
|
| 89 |
+
|
| 90 |
+
```bit
|
| 91 |
+
@article{lu2025ui,
|
| 92 |
+
title={UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning},
|
| 93 |
+
author={Lu, Zhengxi and Chai, Yuxiang and Guo, Yaxuan and Yin, Xi and Liu, Liang and Wang, Hao and Xiong, Guanjing and Li, Hongsheng},
|
| 94 |
+
journal={arXiv preprint arXiv:2503.21620},
|
| 95 |
+
year={2025}
|
| 96 |
+
}
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
## 🤝 Acknowledgements
|
| 100 |
+
|
| 101 |
+
We sincerely thank projects [R1-V](https://github.com/Deep-Agent/R1-V), [Open-R1](https://github.com/huggingface/open-r1), and [Open-r1-multimodal](https://github.com/EvolvingLMMs-Lab/open-r1-multimodal), [VLM-R1](https://github.com/om-ai-lab/VLM-R1) for providing their open-source resources.
|