Instructions to use InfiX-ai/InfiGUI-G1-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use InfiX-ai/InfiGUI-G1-3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="InfiX-ai/InfiGUI-G1-3B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("InfiX-ai/InfiGUI-G1-3B")
model = AutoModelForMultimodalLM.from_pretrained("InfiX-ai/InfiGUI-G1-3B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use InfiX-ai/InfiGUI-G1-3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "InfiX-ai/InfiGUI-G1-3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "InfiX-ai/InfiGUI-G1-3B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/InfiX-ai/InfiGUI-G1-3B

SGLang

How to use InfiX-ai/InfiGUI-G1-3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "InfiX-ai/InfiGUI-G1-3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "InfiX-ai/InfiGUI-G1-3B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "InfiX-ai/InfiGUI-G1-3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "InfiX-ai/InfiGUI-G1-3B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use InfiX-ai/InfiGUI-G1-3B with Docker Model Runner:
```
docker model run hf.co/InfiX-ai/InfiGUI-G1-3B
```

Improve model card: Add project page link and evaluation section, update citation

by nielsr HF Staff - opened Aug 12, 2025

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

+93

-3

Files changed (1) hide show

README.md +93 -3

README.md CHANGED Viewed

@@ -18,6 +18,7 @@ tags:
 This repository contains the InfiGUI-G1-3B model from the paper **[InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization](https://arxiv.org/abs/2508.05731)**.
 [![GitHub Repo](https://img.shields.io/badge/GitHub-Repo-181717?style=flat&logo=github&logoColor=white)](https://github.com/InfiXAI/InfiGUI-G1)
 ## Paper Abstract
@@ -217,7 +218,92 @@ On the widely-used ScreenSpot-V2 benchmark, which provides comprehensive coverag
   <img src="https://raw.githubusercontent.com/InfiXAI/InfiGUI-G1/main/assets/results_screenspot-v2.png" width="90%" alt="ScreenSpot-V2 Results">
 </div>
-## Citation Information
 If you find this work useful, we would be grateful if you consider citing the following papers:
@@ -245,8 +331,12 @@ If you find this work useful, we would be grateful if you consider citing the fo
 ```bibtex
 @article{liu2025infiguiagent,
   title={InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection},
-  author={Liu, Yuhang and Li, Pengxiang and Wei, Zishu and Xie, Congkai and Hu, Xueyu and Xu, Xinchen and Zhang, Shengyu and Han, Xiaotian and Yang, Hongxia and Wu, Fei},
   journal={arXiv preprint arXiv:2501.04575},
   year={2025}
 }
-```

 This repository contains the InfiGUI-G1-3B model from the paper **[InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization](https://arxiv.org/abs/2508.05731)**.
 [![GitHub Repo](https://img.shields.io/badge/GitHub-Repo-181717?style=flat&logo=github&logoColor=white)](https://github.com/InfiXAI/InfiGUI-G1)
+[![Project Page](https://img.shields.io/badge/Project%20Page-Website-blue?style=flat)](https://osatlas.github.io/)
 ## Paper Abstract
   <img src="https://raw.githubusercontent.com/InfiXAI/InfiGUI-G1/main/assets/results_screenspot-v2.png" width="90%" alt="ScreenSpot-V2 Results">
 </div>
+## ⚙️ Evaluation
+This section provides instructions for reproducing the evaluation results reported in our paper.
+### 1. Getting Started
+Clone the repository and navigate to the project directory:
+```bash
+git clone https://github.com/InfiXAI/InfiGUI-G1.git
+cd InfiGUI-G1
+```
+### 2. Environment Setup
+The evaluation pipeline is built upon the [vLLM](https://github.com/vllm-project/vllm) library for efficient inference. For detailed installation guidance, please refer to the official vLLM repository. The specific versions used to obtain the results reported in our paper are as follows:
+-   **Python**: `3.10.12`
+-   **PyTorch**: `2.6.0`
+-   **Transformers**: `4.50.1`
+-   **vLLM**: `0.8.2`
+-   **CUDA**: `12.6`
+The reported results were obtained on a server equipped with 4 x NVIDIA H800 GPUs.
+### 3. Model Download
+Download the InfiGUI-G1 models from the Hugging Face Hub into the `./models` directory.
+```bash
+# Create a directory for models
+mkdir -p ./models
+# Download InfiGUI-G1-3B
+huggingface-cli download --resume-download InfiX-ai/InfiGUI-G1-3B --local-dir ./models/InfiGUI-G1-3B
+# Download InfiGUI-G1-7B
+huggingface-cli download --resume-download InfiX-ai/InfiGUI-G1-7B --local-dir ./models/InfiGUI-G1-7B
+```
+### 4. Dataset Download and Preparation
+Download the required evaluation benchmarks into the `./data` directory.
+```bash
+# Create a directory for datasets
+mkdir -p ./data
+# Download benchmarks
+huggingface-cli download --repo-type dataset --resume-download likaixin/ScreenSpot-Pro --local-dir ./data/ScreenSpot-Pro
+huggingface-cli download --repo-type dataset --resume-download ServiceNow/ui-vision --local-dir ./data/ui-vision
+huggingface-cli download --repo-type dataset --resume-download OS-Copilot/ScreenSpot-v2 --local-dir ./data/ScreenSpot-v2
+huggingface-cli download --repo-type dataset --resume-download OpenGVLab/MMBench-GUI --local-dir ./data/MMBench-GUI
+huggingface-cli download --repo-type dataset --resume-download vaundys/I2E-Bench --local-dir ./data/I2E-Bench
+```
+After downloading, some datasets require unzipping compressed image files.
+```bash
+# Unzip images for ScreenSpot-v2
+unzip ./data/ScreenSpot-v2/screenspotv2_image.zip -d ./data/ScreenSpot-v2/
+# Unzip images for MMBench-GUI
+unzip ./data/MMBench-GUI/MMBench-GUI-OfflineImages.zip -d ./data/MMBench-GUI/
+```
+### 5. Running the Evaluation
+To run the evaluation, use the `eval/eval.py` script. You must specify the path to the model, the benchmark name, and the tensor parallel size.
+Here is an example command to evaluate the `InfiGUI-G1-3B` model on the `screenspot-pro` benchmark using 4 GPUs:
+```bash
+python eval/eval.py \
+    ./models/InfiGUI-G1-3B \
+    --benchmark screenspot-pro \
+    --tensor-parallel 4
+```
+-   **`model_path`**: The first positional argument specifies the path to the downloaded model directory (e.g., `./models/InfiGUI-G1-3B`).
+-   **`--benchmark`**: Specifies the benchmark to evaluate. Available options include `screenspot-pro`, `screenspot-v2`, `ui-vision`, `mmbench-gui`, and `i2e-bench`.
+-   **`--tensor-parallel`**: Sets the tensor parallelism size, which should typically match the number of available GPUs.
+Evaluation results, including detailed logs and performance metrics, will be saved to the `./output/{model_name}/{benchmark}/` directory.
+## 📚 Citation Information
 If you find this work useful, we would be grateful if you consider citing the following papers:
 ```bibtex
 @article{liu2025infiguiagent,
   title={InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection},
+  author={Liu, Yuhang and Li, Pengxiang and Wei, Zishu and Xie, Congkai and Hu, Xueyu and Zhang, Shengyu and Han, Xiaotian and Yang, Hongxia and Wu, Fei},
   journal={arXiv preprint arXiv:2501.04575},
   year={2025}
 }
+```
+## 🙏 Acknowledgements
+We would like to express our gratitude for the following open-source projects: [VERL](https://github.com/volcengine/verl), [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL) and [vLLM](https://github.com/vllm-project/vllm).