Safetensors
English
qwen3_vl
VLM
GUI
agent

Add pipeline tag, library name, and paper link

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +47 -38
README.md CHANGED
@@ -1,43 +1,50 @@
1
  ---
2
- license: apache-2.0
 
3
  datasets:
4
  - GUI-Libra/GUI-Libra-81K-RL
5
  - GUI-Libra/GUI-Libra-81K-SFT
6
  language:
7
  - en
8
- base_model:
9
- - Qwen/Qwen3-VL-8B-Instruct
 
10
  tags:
11
  - VLM
12
  - GUI
13
  - agent
14
  ---
15
 
16
- # Introduction
 
 
17
 
18
- The models from paper "GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL".
19
 
 
20
 
21
- **GitHub:** https://github.com/GUI-Libra/GUI-Libra
22
- **Website:** https://GUI-Libra.github.io
23
 
 
 
 
 
24
 
25
- # Usage
26
- ## 1) Start an OpenAI-compatible vLLM server
 
27
 
28
  ```bash
29
  pip install -U vllm
30
  vllm serve GUI-Libra/GUI-Libra-8B --port 8000 --api-key token-abc123
31
- ````
32
-
33
- * Endpoint: `http://localhost:8000/v1`
34
- * The `api_key` here must match `--api-key`.
35
 
 
 
36
 
37
- ## 2) Minimal Python example (prompt + image → request)
38
 
39
  Install dependencies:
40
-
41
  ```bash
42
  pip install -U openai
43
  ```
@@ -76,14 +83,19 @@ action_type: Scroll, action_target: None, value: "up" | "down" | "left" | "right
76
  """
77
 
78
  # 2) Your prompt (instruction + desired output format)
79
-
80
  task_desc = 'Go to Amazon.com and buy a math book'
81
  prev_txt = ''
82
- question_description = '''Please generate the next move according to the UI screenshot {}, instruction and previous actions.\n\nInstruction: {}\n\nInteraction History: {}\n'''
83
- img_size_string = '(original image size {}x{})'.format(img_size[0], img_size[1])
84
- query = question_description.format(img_size_string, task_desc, prev_txt)
 
85
 
86
- query = query + '\n' + '''The response should be structured in the following format:
 
 
 
 
 
87
  <thinking>Your step-by-step thought process here...</thinking>
88
  <answer>
89
  {
@@ -97,11 +109,11 @@ query = query + '\n' + '''The response should be structured in the following for
97
  resp = client.chat.completions.create(
98
  model=MODEL,
99
  messages=[
100
- {"role": "system", "content": "You are a helpful GUI agent."},
101
  {"role": "user", "content": [
102
  {"type": "image_url",
103
  "image_url": {"url": f"data:image/png;base64,{img_b64}", "detail": "high"}},
104
- {"type": "text", "text": prompt},
105
  ]},
106
  ],
107
  temperature=0.0,
@@ -111,19 +123,16 @@ resp = client.chat.completions.create(
111
  print(resp.choices[0].message.content)
112
  ```
113
 
114
- Run:
115
-
116
- ```bash
117
- python minimal_infer.py
118
- ```
119
-
120
- ---
121
-
122
- ## Notes
123
-
124
- * Replace `screen.png` with your own screenshot file.
125
- * If you hit OOM or slowdowns, reduce image size or run fewer concurrent requests.
126
- * The example assumes your vLLM server is running locally on port `8000`.
127
-
128
-
129
-
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen3-VL-8B-Instruct
4
  datasets:
5
  - GUI-Libra/GUI-Libra-81K-RL
6
  - GUI-Libra/GUI-Libra-81K-SFT
7
  language:
8
  - en
9
+ license: apache-2.0
10
+ library_name: transformers
11
+ pipeline_tag: image-text-to-text
12
  tags:
13
  - VLM
14
  - GUI
15
  - agent
16
  ---
17
 
18
+ # GUI-Libra-8B
19
+
20
+ [**Project Page**](https://GUI-Libra.github.io) | [**Paper**](https://huggingface.co/papers/2602.22190) | [**GitHub**](https://github.com/GUI-Libra/GUI-Libra)
21
 
22
+ GUI-Libra-8B is a native GUI agent model fine-tuned from [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct). It is designed to perceive screenshots, reason step-by-step, and output executable actions in a single forward pass.
23
 
24
+ The model is introduced in the paper [GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL](https://huggingface.co/papers/2602.22190).
25
 
26
+ ## Introduction
 
27
 
28
+ GUI-Libra addresses key limitations in open-source GUI agents through three main contributions:
29
+ 1. **GUI-Libra-81K**: A curated reasoning dataset with 81,000 steps.
30
+ 2. **Action-Aware SFT**: A training strategy that balances chain-of-thought reasoning with visual grounding accuracy.
31
+ 3. **Conservative RL**: A KL-regularized GRPO approach tailored for GUI environments where rewards are only partially verifiable.
32
 
33
+ ## Usage
34
+
35
+ ### 1) Start an OpenAI-compatible vLLM server
36
 
37
  ```bash
38
  pip install -U vllm
39
  vllm serve GUI-Libra/GUI-Libra-8B --port 8000 --api-key token-abc123
40
+ ```
 
 
 
41
 
42
+ * Endpoint: `http://localhost:8000/v1`
43
+ * The `api_key` here must match `--api-key`.
44
 
45
+ ### 2) Minimal Python example
46
 
47
  Install dependencies:
 
48
  ```bash
49
  pip install -U openai
50
  ```
 
83
  """
84
 
85
  # 2) Your prompt (instruction + desired output format)
 
86
  task_desc = 'Go to Amazon.com and buy a math book'
87
  prev_txt = ''
88
+ # Note: Ensure img_size is defined or use default
89
+ question_description = '''Please generate the next move according to the UI screenshot, instruction and previous actions.
90
+
91
+ Instruction: {}
92
 
93
+ Interaction History: {}
94
+ '''
95
+ query = question_description.format(task_desc, prev_txt)
96
+
97
+ query = query + '
98
+ ' + '''The response should be structured in the following format:
99
  <thinking>Your step-by-step thought process here...</thinking>
100
  <answer>
101
  {
 
109
  resp = client.chat.completions.create(
110
  model=MODEL,
111
  messages=[
112
+ {"role": "system", "content": system_prompt},
113
  {"role": "user", "content": [
114
  {"type": "image_url",
115
  "image_url": {"url": f"data:image/png;base64,{img_b64}", "detail": "high"}},
116
+ {"type": "text", "text": query},
117
  ]},
118
  ],
119
  temperature=0.0,
 
123
  print(resp.choices[0].message.content)
124
  ```
125
 
126
+ ## Citation
127
+
128
+ ```bibtex
129
+ @misc{yang2026guilibratrainingnativegui,
130
+ title={GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL},
131
+ author={Rui Yang and Qianhui Wu and Zhaoyang Wang and Hanyang Chen and Ke Yang and Hao Cheng and Huaxiu Yao and Baoling Peng and Huan Zhang and Jianfeng Gao and Tong Zhang},
132
+ year={2026},
133
+ eprint={2602.22190},
134
+ archivePrefix={arXiv},
135
+ primaryClass={cs.LG},
136
+ url={https://arxiv.org/abs/2602.22190},
137
+ }
138
+ ```