Improve model card: Add project page link and evaluation section, update citation
#2
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -18,6 +18,7 @@ tags:
|
|
| 18 |
This repository contains the InfiGUI-G1-3B model from the paper **[InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization](https://arxiv.org/abs/2508.05731)**.
|
| 19 |
|
| 20 |
[](https://github.com/InfiXAI/InfiGUI-G1)
|
|
|
|
| 21 |
|
| 22 |
## Paper Abstract
|
| 23 |
|
|
@@ -217,7 +218,92 @@ On the widely-used ScreenSpot-V2 benchmark, which provides comprehensive coverag
|
|
| 217 |
<img src="https://raw.githubusercontent.com/InfiXAI/InfiGUI-G1/main/assets/results_screenspot-v2.png" width="90%" alt="ScreenSpot-V2 Results">
|
| 218 |
</div>
|
| 219 |
|
| 220 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 221 |
|
| 222 |
If you find this work useful, we would be grateful if you consider citing the following papers:
|
| 223 |
|
|
@@ -245,8 +331,12 @@ If you find this work useful, we would be grateful if you consider citing the fo
|
|
| 245 |
```bibtex
|
| 246 |
@article{liu2025infiguiagent,
|
| 247 |
title={InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection},
|
| 248 |
-
author={Liu, Yuhang and Li, Pengxiang and Wei, Zishu and Xie, Congkai and Hu, Xueyu and
|
| 249 |
journal={arXiv preprint arXiv:2501.04575},
|
| 250 |
year={2025}
|
| 251 |
}
|
| 252 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
This repository contains the InfiGUI-G1-3B model from the paper **[InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization](https://arxiv.org/abs/2508.05731)**.
|
| 19 |
|
| 20 |
[](https://github.com/InfiXAI/InfiGUI-G1)
|
| 21 |
+
[](https://osatlas.github.io/)
|
| 22 |
|
| 23 |
## Paper Abstract
|
| 24 |
|
|
|
|
| 218 |
<img src="https://raw.githubusercontent.com/InfiXAI/InfiGUI-G1/main/assets/results_screenspot-v2.png" width="90%" alt="ScreenSpot-V2 Results">
|
| 219 |
</div>
|
| 220 |
|
| 221 |
+
## ⚙️ Evaluation
|
| 222 |
+
|
| 223 |
+
This section provides instructions for reproducing the evaluation results reported in our paper.
|
| 224 |
+
|
| 225 |
+
### 1. Getting Started
|
| 226 |
+
|
| 227 |
+
Clone the repository and navigate to the project directory:
|
| 228 |
+
|
| 229 |
+
```bash
|
| 230 |
+
git clone https://github.com/InfiXAI/InfiGUI-G1.git
|
| 231 |
+
cd InfiGUI-G1
|
| 232 |
+
```
|
| 233 |
+
|
| 234 |
+
### 2. Environment Setup
|
| 235 |
+
|
| 236 |
+
The evaluation pipeline is built upon the [vLLM](https://github.com/vllm-project/vllm) library for efficient inference. For detailed installation guidance, please refer to the official vLLM repository. The specific versions used to obtain the results reported in our paper are as follows:
|
| 237 |
+
|
| 238 |
+
- **Python**: `3.10.12`
|
| 239 |
+
- **PyTorch**: `2.6.0`
|
| 240 |
+
- **Transformers**: `4.50.1`
|
| 241 |
+
- **vLLM**: `0.8.2`
|
| 242 |
+
- **CUDA**: `12.6`
|
| 243 |
+
|
| 244 |
+
The reported results were obtained on a server equipped with 4 x NVIDIA H800 GPUs.
|
| 245 |
+
|
| 246 |
+
### 3. Model Download
|
| 247 |
+
|
| 248 |
+
Download the InfiGUI-G1 models from the Hugging Face Hub into the `./models` directory.
|
| 249 |
+
|
| 250 |
+
```bash
|
| 251 |
+
# Create a directory for models
|
| 252 |
+
mkdir -p ./models
|
| 253 |
+
|
| 254 |
+
# Download InfiGUI-G1-3B
|
| 255 |
+
huggingface-cli download --resume-download InfiX-ai/InfiGUI-G1-3B --local-dir ./models/InfiGUI-G1-3B
|
| 256 |
+
|
| 257 |
+
# Download InfiGUI-G1-7B
|
| 258 |
+
huggingface-cli download --resume-download InfiX-ai/InfiGUI-G1-7B --local-dir ./models/InfiGUI-G1-7B
|
| 259 |
+
```
|
| 260 |
+
|
| 261 |
+
### 4. Dataset Download and Preparation
|
| 262 |
+
|
| 263 |
+
Download the required evaluation benchmarks into the `./data` directory.
|
| 264 |
+
|
| 265 |
+
```bash
|
| 266 |
+
# Create a directory for datasets
|
| 267 |
+
mkdir -p ./data
|
| 268 |
+
|
| 269 |
+
# Download benchmarks
|
| 270 |
+
huggingface-cli download --repo-type dataset --resume-download likaixin/ScreenSpot-Pro --local-dir ./data/ScreenSpot-Pro
|
| 271 |
+
huggingface-cli download --repo-type dataset --resume-download ServiceNow/ui-vision --local-dir ./data/ui-vision
|
| 272 |
+
huggingface-cli download --repo-type dataset --resume-download OS-Copilot/ScreenSpot-v2 --local-dir ./data/ScreenSpot-v2
|
| 273 |
+
huggingface-cli download --repo-type dataset --resume-download OpenGVLab/MMBench-GUI --local-dir ./data/MMBench-GUI
|
| 274 |
+
huggingface-cli download --repo-type dataset --resume-download vaundys/I2E-Bench --local-dir ./data/I2E-Bench
|
| 275 |
+
```
|
| 276 |
+
|
| 277 |
+
After downloading, some datasets require unzipping compressed image files.
|
| 278 |
+
|
| 279 |
+
```bash
|
| 280 |
+
# Unzip images for ScreenSpot-v2
|
| 281 |
+
unzip ./data/ScreenSpot-v2/screenspotv2_image.zip -d ./data/ScreenSpot-v2/
|
| 282 |
+
|
| 283 |
+
# Unzip images for MMBench-GUI
|
| 284 |
+
unzip ./data/MMBench-GUI/MMBench-GUI-OfflineImages.zip -d ./data/MMBench-GUI/
|
| 285 |
+
```
|
| 286 |
+
|
| 287 |
+
### 5. Running the Evaluation
|
| 288 |
+
|
| 289 |
+
To run the evaluation, use the `eval/eval.py` script. You must specify the path to the model, the benchmark name, and the tensor parallel size.
|
| 290 |
+
|
| 291 |
+
Here is an example command to evaluate the `InfiGUI-G1-3B` model on the `screenspot-pro` benchmark using 4 GPUs:
|
| 292 |
+
|
| 293 |
+
```bash
|
| 294 |
+
python eval/eval.py \
|
| 295 |
+
./models/InfiGUI-G1-3B \
|
| 296 |
+
--benchmark screenspot-pro \
|
| 297 |
+
--tensor-parallel 4
|
| 298 |
+
```
|
| 299 |
+
|
| 300 |
+
- **`model_path`**: The first positional argument specifies the path to the downloaded model directory (e.g., `./models/InfiGUI-G1-3B`).
|
| 301 |
+
- **`--benchmark`**: Specifies the benchmark to evaluate. Available options include `screenspot-pro`, `screenspot-v2`, `ui-vision`, `mmbench-gui`, and `i2e-bench`.
|
| 302 |
+
- **`--tensor-parallel`**: Sets the tensor parallelism size, which should typically match the number of available GPUs.
|
| 303 |
+
|
| 304 |
+
Evaluation results, including detailed logs and performance metrics, will be saved to the `./output/{model_name}/{benchmark}/` directory.
|
| 305 |
+
|
| 306 |
+
## 📚 Citation Information
|
| 307 |
|
| 308 |
If you find this work useful, we would be grateful if you consider citing the following papers:
|
| 309 |
|
|
|
|
| 331 |
```bibtex
|
| 332 |
@article{liu2025infiguiagent,
|
| 333 |
title={InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection},
|
| 334 |
+
author={Liu, Yuhang and Li, Pengxiang and Wei, Zishu and Xie, Congkai and Hu, Xueyu and Zhang, Shengyu and Han, Xiaotian and Yang, Hongxia and Wu, Fei},
|
| 335 |
journal={arXiv preprint arXiv:2501.04575},
|
| 336 |
year={2025}
|
| 337 |
}
|
| 338 |
+
```
|
| 339 |
+
|
| 340 |
+
## 🙏 Acknowledgements
|
| 341 |
+
|
| 342 |
+
We would like to express our gratitude for the following open-source projects: [VERL](https://github.com/volcengine/verl), [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL) and [vLLM](https://github.com/vllm-project/vllm).
|