Improve model card: Add project page link and evaluation section, update citation

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +93 -3
README.md CHANGED
@@ -18,6 +18,7 @@ tags:
18
  This repository contains the InfiGUI-G1-3B model from the paper **[InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization](https://arxiv.org/abs/2508.05731)**.
19
 
20
  [![GitHub Repo](https://img.shields.io/badge/GitHub-Repo-181717?style=flat&logo=github&logoColor=white)](https://github.com/InfiXAI/InfiGUI-G1)
 
21
 
22
  ## Paper Abstract
23
 
@@ -217,7 +218,92 @@ On the widely-used ScreenSpot-V2 benchmark, which provides comprehensive coverag
217
  <img src="https://raw.githubusercontent.com/InfiXAI/InfiGUI-G1/main/assets/results_screenspot-v2.png" width="90%" alt="ScreenSpot-V2 Results">
218
  </div>
219
 
220
- ## Citation Information
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
221
 
222
  If you find this work useful, we would be grateful if you consider citing the following papers:
223
 
@@ -245,8 +331,12 @@ If you find this work useful, we would be grateful if you consider citing the fo
245
  ```bibtex
246
  @article{liu2025infiguiagent,
247
  title={InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection},
248
- author={Liu, Yuhang and Li, Pengxiang and Wei, Zishu and Xie, Congkai and Hu, Xueyu and Xu, Xinchen and Zhang, Shengyu and Han, Xiaotian and Yang, Hongxia and Wu, Fei},
249
  journal={arXiv preprint arXiv:2501.04575},
250
  year={2025}
251
  }
252
- ```
 
 
 
 
 
18
  This repository contains the InfiGUI-G1-3B model from the paper **[InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization](https://arxiv.org/abs/2508.05731)**.
19
 
20
  [![GitHub Repo](https://img.shields.io/badge/GitHub-Repo-181717?style=flat&logo=github&logoColor=white)](https://github.com/InfiXAI/InfiGUI-G1)
21
+ [![Project Page](https://img.shields.io/badge/Project%20Page-Website-blue?style=flat)](https://osatlas.github.io/)
22
 
23
  ## Paper Abstract
24
 
 
218
  <img src="https://raw.githubusercontent.com/InfiXAI/InfiGUI-G1/main/assets/results_screenspot-v2.png" width="90%" alt="ScreenSpot-V2 Results">
219
  </div>
220
 
221
+ ## ⚙️ Evaluation
222
+
223
+ This section provides instructions for reproducing the evaluation results reported in our paper.
224
+
225
+ ### 1. Getting Started
226
+
227
+ Clone the repository and navigate to the project directory:
228
+
229
+ ```bash
230
+ git clone https://github.com/InfiXAI/InfiGUI-G1.git
231
+ cd InfiGUI-G1
232
+ ```
233
+
234
+ ### 2. Environment Setup
235
+
236
+ The evaluation pipeline is built upon the [vLLM](https://github.com/vllm-project/vllm) library for efficient inference. For detailed installation guidance, please refer to the official vLLM repository. The specific versions used to obtain the results reported in our paper are as follows:
237
+
238
+ - **Python**: `3.10.12`
239
+ - **PyTorch**: `2.6.0`
240
+ - **Transformers**: `4.50.1`
241
+ - **vLLM**: `0.8.2`
242
+ - **CUDA**: `12.6`
243
+
244
+ The reported results were obtained on a server equipped with 4 x NVIDIA H800 GPUs.
245
+
246
+ ### 3. Model Download
247
+
248
+ Download the InfiGUI-G1 models from the Hugging Face Hub into the `./models` directory.
249
+
250
+ ```bash
251
+ # Create a directory for models
252
+ mkdir -p ./models
253
+
254
+ # Download InfiGUI-G1-3B
255
+ huggingface-cli download --resume-download InfiX-ai/InfiGUI-G1-3B --local-dir ./models/InfiGUI-G1-3B
256
+
257
+ # Download InfiGUI-G1-7B
258
+ huggingface-cli download --resume-download InfiX-ai/InfiGUI-G1-7B --local-dir ./models/InfiGUI-G1-7B
259
+ ```
260
+
261
+ ### 4. Dataset Download and Preparation
262
+
263
+ Download the required evaluation benchmarks into the `./data` directory.
264
+
265
+ ```bash
266
+ # Create a directory for datasets
267
+ mkdir -p ./data
268
+
269
+ # Download benchmarks
270
+ huggingface-cli download --repo-type dataset --resume-download likaixin/ScreenSpot-Pro --local-dir ./data/ScreenSpot-Pro
271
+ huggingface-cli download --repo-type dataset --resume-download ServiceNow/ui-vision --local-dir ./data/ui-vision
272
+ huggingface-cli download --repo-type dataset --resume-download OS-Copilot/ScreenSpot-v2 --local-dir ./data/ScreenSpot-v2
273
+ huggingface-cli download --repo-type dataset --resume-download OpenGVLab/MMBench-GUI --local-dir ./data/MMBench-GUI
274
+ huggingface-cli download --repo-type dataset --resume-download vaundys/I2E-Bench --local-dir ./data/I2E-Bench
275
+ ```
276
+
277
+ After downloading, some datasets require unzipping compressed image files.
278
+
279
+ ```bash
280
+ # Unzip images for ScreenSpot-v2
281
+ unzip ./data/ScreenSpot-v2/screenspotv2_image.zip -d ./data/ScreenSpot-v2/
282
+
283
+ # Unzip images for MMBench-GUI
284
+ unzip ./data/MMBench-GUI/MMBench-GUI-OfflineImages.zip -d ./data/MMBench-GUI/
285
+ ```
286
+
287
+ ### 5. Running the Evaluation
288
+
289
+ To run the evaluation, use the `eval/eval.py` script. You must specify the path to the model, the benchmark name, and the tensor parallel size.
290
+
291
+ Here is an example command to evaluate the `InfiGUI-G1-3B` model on the `screenspot-pro` benchmark using 4 GPUs:
292
+
293
+ ```bash
294
+ python eval/eval.py \
295
+ ./models/InfiGUI-G1-3B \
296
+ --benchmark screenspot-pro \
297
+ --tensor-parallel 4
298
+ ```
299
+
300
+ - **`model_path`**: The first positional argument specifies the path to the downloaded model directory (e.g., `./models/InfiGUI-G1-3B`).
301
+ - **`--benchmark`**: Specifies the benchmark to evaluate. Available options include `screenspot-pro`, `screenspot-v2`, `ui-vision`, `mmbench-gui`, and `i2e-bench`.
302
+ - **`--tensor-parallel`**: Sets the tensor parallelism size, which should typically match the number of available GPUs.
303
+
304
+ Evaluation results, including detailed logs and performance metrics, will be saved to the `./output/{model_name}/{benchmark}/` directory.
305
+
306
+ ## 📚 Citation Information
307
 
308
  If you find this work useful, we would be grateful if you consider citing the following papers:
309
 
 
331
  ```bibtex
332
  @article{liu2025infiguiagent,
333
  title={InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection},
334
+ author={Liu, Yuhang and Li, Pengxiang and Wei, Zishu and Xie, Congkai and Hu, Xueyu and Zhang, Shengyu and Han, Xiaotian and Yang, Hongxia and Wu, Fei},
335
  journal={arXiv preprint arXiv:2501.04575},
336
  year={2025}
337
  }
338
+ ```
339
+
340
+ ## 🙏 Acknowledgements
341
+
342
+ We would like to express our gratitude for the following open-source projects: [VERL](https://github.com/volcengine/verl), [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL) and [vLLM](https://github.com/vllm-project/vllm).