Safetensors
English
qwen2_5_vl
VLM
GUI
agent

Add pipeline_tag, library_name and paper reference

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +32 -26
README.md CHANGED
@@ -1,34 +1,35 @@
1
  ---
2
- license: apache-2.0
 
3
  datasets:
4
  - GUI-Libra/GUI-Libra-81K-RL
5
  - GUI-Libra/GUI-Libra-81K-SFT
6
  language:
7
  - en
8
- base_model:
9
- - Qwen/Qwen2.5-VL-3B-Instruct
 
10
  tags:
11
  - VLM
12
  - GUI
13
  - agent
14
  ---
15
 
16
- # Introduction
17
-
18
- The models from paper "GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL".
19
 
 
20
 
21
- **GitHub:** https://github.com/GUI-Libra/GUI-Libra
22
- **Website:** https://GUI-Libra.github.io
23
 
 
24
 
25
  # Usage
26
  ## 1) Start an OpenAI-compatible vLLM server
27
 
28
  ```bash
29
  pip install -U vllm
30
- vllm serve GUI-Libra/GUI-Libra-4B --port 8000 --api-key token-abc123
31
- ````
32
 
33
  * Endpoint: `http://localhost:8000/v1`
34
  * The `api_key` here must match `--api-key`.
@@ -79,11 +80,17 @@ action_type: Scroll, action_target: None, value: "up" | "down" | "left" | "right
79
 
80
  task_desc = 'Go to Amazon.com and buy a math book'
81
  prev_txt = ''
82
- question_description = '''Please generate the next move according to the UI screenshot {}, instruction and previous actions.\n\nInstruction: {}\n\nInteraction History: {}\n'''
 
 
 
 
 
83
  img_size_string = '(original image size {}x{})'.format(img_size[0], img_size[1])
84
  query = question_description.format(img_size_string, task_desc, prev_txt)
85
 
86
- query = query + '\n' + '''The response should be structured in the following format:
 
87
  <think>Your step-by-step thought process here...</think>
88
  <answer>
89
  {
@@ -111,19 +118,18 @@ resp = client.chat.completions.create(
111
  print(resp.choices[0].message.content)
112
  ```
113
 
114
- Run:
115
-
116
- ```bash
117
- python minimal_infer.py
118
- ```
119
-
120
- ---
121
-
122
- ## Notes
123
-
124
- * Replace `screen.png` with your own screenshot file.
125
- * If you hit OOM or slowdowns, reduce image size or run fewer concurrent requests.
126
- * The example assumes your vLLM server is running locally on port `8000`.
127
-
128
 
 
129
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-VL-3B-Instruct
4
  datasets:
5
  - GUI-Libra/GUI-Libra-81K-RL
6
  - GUI-Libra/GUI-Libra-81K-SFT
7
  language:
8
  - en
9
+ license: apache-2.0
10
+ library_name: transformers
11
+ pipeline_tag: image-text-to-text
12
  tags:
13
  - VLM
14
  - GUI
15
  - agent
16
  ---
17
 
18
+ # GUI-Libra-3B
 
 
19
 
20
+ [**Project Page**](https://gui-libra.github.io) | [**Paper**](https://huggingface.co/papers/2602.22190) | [**GitHub**](https://github.com/GUI-Libra/GUI-Libra)
21
 
22
+ GUI-Libra is a post-training framework that turns open-source VLMs into strong native GUI agents—models that see a screenshot, think step-by-step, and output an executable action, all within a single forward pass.
 
23
 
24
+ This model is fine-tuned from [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) using action-aware SFT and conservative reinforcement learning (GRPO). It addresses challenges such as action-grounding alignment and partial verifiability in GUI navigation tasks.
25
 
26
  # Usage
27
  ## 1) Start an OpenAI-compatible vLLM server
28
 
29
  ```bash
30
  pip install -U vllm
31
+ vllm serve GUI-Libra/GUI-Libra-3B --port 8000 --api-key token-abc123
32
+ ```
33
 
34
  * Endpoint: `http://localhost:8000/v1`
35
  * The `api_key` here must match `--api-key`.
 
80
 
81
  task_desc = 'Go to Amazon.com and buy a math book'
82
  prev_txt = ''
83
+ question_description = '''Please generate the next move according to the UI screenshot {}, instruction and previous actions.
84
+
85
+ Instruction: {}
86
+
87
+ Interaction History: {}
88
+ '''
89
  img_size_string = '(original image size {}x{})'.format(img_size[0], img_size[1])
90
  query = question_description.format(img_size_string, task_desc, prev_txt)
91
 
92
+ query = query + '
93
+ ' + '''The response should be structured in the following format:
94
  <think>Your step-by-step thought process here...</think>
95
  <answer>
96
  {
 
118
  print(resp.choices[0].message.content)
119
  ```
120
 
121
+ ## Citation
 
 
 
 
 
 
 
 
 
 
 
 
 
122
 
123
+ If you find GUI-Libra useful for your research, please cite:
124
 
125
+ ```bibtex
126
+ @misc{yang2026guilibratrainingnativegui,
127
+ title={GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL},
128
+ author={Rui Yang and Qianhui Wu and Zhaoyang Wang and Hanyang Chen and Ke Yang and Hao Cheng and Huaxiu Yao and Baoling Peng and Huan Zhang and Jianfeng Gao and Tong Zhang},
129
+ year={2026},
130
+ eprint={2602.22190},
131
+ archivePrefix={arXiv},
132
+ primaryClass={cs.LG},
133
+ url={https://arxiv.org/abs/2602.22190},
134
+ }
135
+ ```