Add pipeline tag, library name, and paper link
#1
by
nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,43 +1,50 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
| 3 |
datasets:
|
| 4 |
- GUI-Libra/GUI-Libra-81K-RL
|
| 5 |
- GUI-Libra/GUI-Libra-81K-SFT
|
| 6 |
language:
|
| 7 |
- en
|
| 8 |
-
|
| 9 |
-
|
|
|
|
| 10 |
tags:
|
| 11 |
- VLM
|
| 12 |
- GUI
|
| 13 |
- agent
|
| 14 |
---
|
| 15 |
|
| 16 |
-
#
|
|
|
|
|
|
|
| 17 |
|
| 18 |
-
|
| 19 |
|
|
|
|
| 20 |
|
| 21 |
-
|
| 22 |
-
**Website:** https://GUI-Libra.github.io
|
| 23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
-
# Usage
|
| 26 |
-
|
|
|
|
| 27 |
|
| 28 |
```bash
|
| 29 |
pip install -U vllm
|
| 30 |
vllm serve GUI-Libra/GUI-Libra-8B --port 8000 --api-key token-abc123
|
| 31 |
-
```
|
| 32 |
-
|
| 33 |
-
* Endpoint: `http://localhost:8000/v1`
|
| 34 |
-
* The `api_key` here must match `--api-key`.
|
| 35 |
|
|
|
|
|
|
|
| 36 |
|
| 37 |
-
## 2) Minimal Python example
|
| 38 |
|
| 39 |
Install dependencies:
|
| 40 |
-
|
| 41 |
```bash
|
| 42 |
pip install -U openai
|
| 43 |
```
|
|
@@ -76,14 +83,19 @@ action_type: Scroll, action_target: None, value: "up" | "down" | "left" | "right
|
|
| 76 |
"""
|
| 77 |
|
| 78 |
# 2) Your prompt (instruction + desired output format)
|
| 79 |
-
|
| 80 |
task_desc = 'Go to Amazon.com and buy a math book'
|
| 81 |
prev_txt = ''
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
|
|
|
| 85 |
|
| 86 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
<thinking>Your step-by-step thought process here...</thinking>
|
| 88 |
<answer>
|
| 89 |
{
|
|
@@ -97,11 +109,11 @@ query = query + '\n' + '''The response should be structured in the following for
|
|
| 97 |
resp = client.chat.completions.create(
|
| 98 |
model=MODEL,
|
| 99 |
messages=[
|
| 100 |
-
{"role": "system", "content":
|
| 101 |
{"role": "user", "content": [
|
| 102 |
{"type": "image_url",
|
| 103 |
"image_url": {"url": f"data:image/png;base64,{img_b64}", "detail": "high"}},
|
| 104 |
-
{"type": "text", "text":
|
| 105 |
]},
|
| 106 |
],
|
| 107 |
temperature=0.0,
|
|
@@ -111,19 +123,16 @@ resp = client.chat.completions.create(
|
|
| 111 |
print(resp.choices[0].message.content)
|
| 112 |
```
|
| 113 |
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
```
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- Qwen/Qwen3-VL-8B-Instruct
|
| 4 |
datasets:
|
| 5 |
- GUI-Libra/GUI-Libra-81K-RL
|
| 6 |
- GUI-Libra/GUI-Libra-81K-SFT
|
| 7 |
language:
|
| 8 |
- en
|
| 9 |
+
license: apache-2.0
|
| 10 |
+
library_name: transformers
|
| 11 |
+
pipeline_tag: image-text-to-text
|
| 12 |
tags:
|
| 13 |
- VLM
|
| 14 |
- GUI
|
| 15 |
- agent
|
| 16 |
---
|
| 17 |
|
| 18 |
+
# GUI-Libra-8B
|
| 19 |
+
|
| 20 |
+
[**Project Page**](https://GUI-Libra.github.io) | [**Paper**](https://huggingface.co/papers/2602.22190) | [**GitHub**](https://github.com/GUI-Libra/GUI-Libra)
|
| 21 |
|
| 22 |
+
GUI-Libra-8B is a native GUI agent model fine-tuned from [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct). It is designed to perceive screenshots, reason step-by-step, and output executable actions in a single forward pass.
|
| 23 |
|
| 24 |
+
The model is introduced in the paper [GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL](https://huggingface.co/papers/2602.22190).
|
| 25 |
|
| 26 |
+
## Introduction
|
|
|
|
| 27 |
|
| 28 |
+
GUI-Libra addresses key limitations in open-source GUI agents through three main contributions:
|
| 29 |
+
1. **GUI-Libra-81K**: A curated reasoning dataset with 81,000 steps.
|
| 30 |
+
2. **Action-Aware SFT**: A training strategy that balances chain-of-thought reasoning with visual grounding accuracy.
|
| 31 |
+
3. **Conservative RL**: A KL-regularized GRPO approach tailored for GUI environments where rewards are only partially verifiable.
|
| 32 |
|
| 33 |
+
## Usage
|
| 34 |
+
|
| 35 |
+
### 1) Start an OpenAI-compatible vLLM server
|
| 36 |
|
| 37 |
```bash
|
| 38 |
pip install -U vllm
|
| 39 |
vllm serve GUI-Libra/GUI-Libra-8B --port 8000 --api-key token-abc123
|
| 40 |
+
```
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
+
* Endpoint: `http://localhost:8000/v1`
|
| 43 |
+
* The `api_key` here must match `--api-key`.
|
| 44 |
|
| 45 |
+
### 2) Minimal Python example
|
| 46 |
|
| 47 |
Install dependencies:
|
|
|
|
| 48 |
```bash
|
| 49 |
pip install -U openai
|
| 50 |
```
|
|
|
|
| 83 |
"""
|
| 84 |
|
| 85 |
# 2) Your prompt (instruction + desired output format)
|
|
|
|
| 86 |
task_desc = 'Go to Amazon.com and buy a math book'
|
| 87 |
prev_txt = ''
|
| 88 |
+
# Note: Ensure img_size is defined or use default
|
| 89 |
+
question_description = '''Please generate the next move according to the UI screenshot, instruction and previous actions.
|
| 90 |
+
|
| 91 |
+
Instruction: {}
|
| 92 |
|
| 93 |
+
Interaction History: {}
|
| 94 |
+
'''
|
| 95 |
+
query = question_description.format(task_desc, prev_txt)
|
| 96 |
+
|
| 97 |
+
query = query + '
|
| 98 |
+
' + '''The response should be structured in the following format:
|
| 99 |
<thinking>Your step-by-step thought process here...</thinking>
|
| 100 |
<answer>
|
| 101 |
{
|
|
|
|
| 109 |
resp = client.chat.completions.create(
|
| 110 |
model=MODEL,
|
| 111 |
messages=[
|
| 112 |
+
{"role": "system", "content": system_prompt},
|
| 113 |
{"role": "user", "content": [
|
| 114 |
{"type": "image_url",
|
| 115 |
"image_url": {"url": f"data:image/png;base64,{img_b64}", "detail": "high"}},
|
| 116 |
+
{"type": "text", "text": query},
|
| 117 |
]},
|
| 118 |
],
|
| 119 |
temperature=0.0,
|
|
|
|
| 123 |
print(resp.choices[0].message.content)
|
| 124 |
```
|
| 125 |
|
| 126 |
+
## Citation
|
| 127 |
+
|
| 128 |
+
```bibtex
|
| 129 |
+
@misc{yang2026guilibratrainingnativegui,
|
| 130 |
+
title={GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL},
|
| 131 |
+
author={Rui Yang and Qianhui Wu and Zhaoyang Wang and Hanyang Chen and Ke Yang and Hao Cheng and Huaxiu Yao and Baoling Peng and Huan Zhang and Jianfeng Gao and Tong Zhang},
|
| 132 |
+
year={2026},
|
| 133 |
+
eprint={2602.22190},
|
| 134 |
+
archivePrefix={arXiv},
|
| 135 |
+
primaryClass={cs.LG},
|
| 136 |
+
url={https://arxiv.org/abs/2602.22190},
|
| 137 |
+
}
|
| 138 |
+
```
|
|
|
|
|
|
|
|
|