Instructions to use inclusionAI/UI-Venus-Ground-72B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use inclusionAI/UI-Venus-Ground-72B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="inclusionAI/UI-Venus-Ground-72B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("inclusionAI/UI-Venus-Ground-72B")
model = AutoModelForImageTextToText.from_pretrained("inclusionAI/UI-Venus-Ground-72B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use inclusionAI/UI-Venus-Ground-72B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "inclusionAI/UI-Venus-Ground-72B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inclusionAI/UI-Venus-Ground-72B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/inclusionAI/UI-Venus-Ground-72B

SGLang

How to use inclusionAI/UI-Venus-Ground-72B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "inclusionAI/UI-Venus-Ground-72B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inclusionAI/UI-Venus-Ground-72B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "inclusionAI/UI-Venus-Ground-72B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inclusionAI/UI-Venus-Ground-72B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use inclusionAI/UI-Venus-Ground-72B with Docker Model Runner:
```
docker model run hf.co/inclusionAI/UI-Venus-Ground-72B
```

Add pipeline tag and library name to model card

by nielsr HF Staff - opened Aug 18, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+66

-14

Files changed (1) hide show

README.md +66 -14

README.md CHANGED Viewed

@@ -1,6 +1,9 @@
 ---
 license: apache-2.0
 ---
 ### UI-Venus
 This repository contains the UI-Venus model from the report [UI-Venus: Building High-performance UI Agents with RFT](https://arxiv.org/abs/2508.10833). UI-Venus is a native UI agent based on the Qwen2.5-VL multimodal large language model, designed to perform precise GUI element grounding and effective navigation using only screenshots as input. It achieves state-of-the-art performance through Reinforcement Fine-Tuning (RFT) with high-quality training data. More inference details and usage guides are available in the GitHub repository. We will continue to update results on standard benchmarks including Screenspot-v2/Pro and AndroidWorld.
@@ -36,8 +39,6 @@ Key innovations include:
 ---
 ##  Installation
@@ -152,17 +153,17 @@ def inference(instruction, image_path):
 ---
 ###  Results on ScreenSpot-v2
-| **Model**                | **Mobile Text** | **Mobile Icon** | **Desktop Text** | **Desktop Icon** | **Web Text** | **Web Icon** | **Avg.** |
-|--------------------------|-----------------|-----------------|------------------|------------------|--------------|--------------|----------|
-| uitars-1.5               | -               | -               | -                | -                | -            | -            | 94.2     |
-| Seed-1.5-VL              | -               | -               | -                | -                | -            | -            | 95.2     |
-| GPT-4o                   | 26.6            | 24.2            | 24.2             | 19.3             | 12.8         | 11.8         | 20.1     |
-| Qwen2.5-VL-7B            | 97.6            | 87.2            | 90.2             | 74.2             | 93.2         | 81.3         | 88.8     |
-| UI-TARS-7B               | 96.9            | 89.1            | 95.4             | 85.0             | 93.6         | 85.2         | 91.6     |
-| UI-TARS-72B              | 94.8            | 86.3            | 91.2             | 87.9             | 91.5         | 87.7         | 90.3     |
-| LPO                      | 97.9            | 82.9            | 95.9             | 86.4             | 95.6         | 84.2         | 90.5     |
-| **UI-Venus-Ground-7B (Ours)**  | **99.0**        | **90.0**        | **97.0**         | **90.7**         | **96.2**     | **88.7**     | **94.1** |
-| **UI-Venus-Ground-72B (Ours)**  | **99.7**        | **93.8**        | **95.9**         | **90.0**         | **96.2**     | **92.6**     | **95.3** |
 ---
@@ -193,6 +194,57 @@ Scores are in percentage (%). `T` = Text, `I` = Icon.
 > 🔝 **Experimental results show that UI-Venus-Ground-72B achieves state-of-the-art performance on ScreenSpot-Pro with an average score of 61.7, while also setting new benchmarks on ScreenSpot-v2(95.3), OSWorld_G(69.8), AgentCPM(84.7), and UI-Vision(38.0), highlighting its effectiveness in complex visual grounding and action prediction tasks.**
 # Citation
 Please consider citing if you find our work useful:
 ```plain
@@ -205,4 +257,4 @@ Please consider citing if you find our work useful:
       primaryClass={cs.CV},
       url={https://arxiv.org/abs/2508.10833},
 }
-```

 ---
 license: apache-2.0
+pipeline_tag: image-text-to-text
+library_name: transformers
 ---
 ### UI-Venus
 This repository contains the UI-Venus model from the report [UI-Venus: Building High-performance UI Agents with RFT](https://arxiv.org/abs/2508.10833). UI-Venus is a native UI agent based on the Qwen2.5-VL multimodal large language model, designed to perform precise GUI element grounding and effective navigation using only screenshots as input. It achieves state-of-the-art performance through Reinforcement Fine-Tuning (RFT) with high-quality training data. More inference details and usage guides are available in the GitHub repository. We will continue to update results on standard benchmarks including Screenspot-v2/Pro and AndroidWorld.
 ---
 ##  Installation
 ---
 ###  Results on ScreenSpot-v2
+| **Model** | **Mobile Text** | **Mobile Icon** | **Desktop Text** | **Desktop Icon** | **Web Text** | **Web Icon** | **Avg.** |
+|---|---|---|---|---|---|---|---|
+| uitars-1.5 | - | - | - | - | - | - | 94.2 |
+| Seed-1.5-VL | - | - | - | - | - | - | 95.2 |
+| GPT-4o | 26.6 | 24.2 | 24.2 | 19.3 | 12.8 | 11.8 | 20.1 |
+| Qwen2.5-VL-7B | 97.6 | 87.2 | 90.2 | 74.2 | 93.2 | 81.3 | 88.8 |
+| UI-TARS-7B | 96.9 | 89.1 | 95.4 | 85.0 | 93.6 | 85.2 | 91.6 |
+| UI-TARS-72B | 94.8 | 86.3 | 91.2 | 87.9 | 91.5 | 87.7 | 90.3 |
+| LPO | 97.9 | 82.9 | 95.9 | 86.4 | 95.6 | 84.2 | 90.5 |
+| **UI-Venus-Ground-7B (Ours)** | **99.0** | **90.0** | **97.0** | **90.7** | **96.2** | **88.7** | **94.1** |
+| **UI-Venus-Ground-72B (Ours)** | **99.7** | **93.8** | **95.9** | **90.0** | **96.2** | **92.6** | **95.3** |
 ---
 > 🔝 **Experimental results show that UI-Venus-Ground-72B achieves state-of-the-art performance on ScreenSpot-Pro with an average score of 61.7, while also setting new benchmarks on ScreenSpot-v2(95.3), OSWorld_G(69.8), AgentCPM(84.7), and UI-Vision(38.0), highlighting its effectiveness in complex visual grounding and action prediction tasks.**
+### Results on AndroidWorld
+This is the compressed package of validation trajectories for **AndroidWorld**, including execution logs and navigation paths.
+📥 Download: [UI-Venus-androidworld.zip](vis_androidworld/UI-Venus-androidworld.zip)
+| Models | With Planner | A11y Tree | Screenshot | Success Rate (pass@1) |
+|--------|--------------|-----------|------------|------------------------|
+| **Closed-source Models** | | | | |
+| GPT-4o| ❌ | ✅ | ❌ | 30.6 |
+| ScaleTrack| ❌ | ✅ | ❌ | 44.0 |
+| SeedVL-1.5 | ❌ | ✅ | ✅ | 62.1 |
+| UI-TARS-1.5 | ❌ | ❌ | ✅ | 64.2 |
+| **Open-source Models** | | | | |
+| GUI-Critic-R1-7B | ❌ | ✅ | ✅ | 27.6 |
+| Qwen2.5-VL-72B* | ❌ | ❌ | ✅ | 35.0 |
+| UGround | ✅ | ❌ | ✅ | 44.0 |
+| Aria-UI | ✅ | ❌ | ✅ | 44.8 |
+| UI-TARS-72B | ❌ | ❌ | ✅ | 46.6 |
+| GLM-4.5v | ❌ | ❌ | ✅ | 57.0 |
+| **Ours** | | | | |
+| UI-Venus-Navi-7B | ❌ | ❌ | ✅ | **49.1** |
+| UI-Venus-Navi-72B | ❌ | ❌ | ✅ | **65.9** |
+> **Table:** Performance comparison on **AndroidWorld** for end-to-end models. Our UI-Venus-Navi-72B achieves state-of-the-art performance, outperforming all baseline methods across different settings.
+### Results on AndroidControl and GUI-Odyssey
+| Models | AndroidControl-Low<br>Type Acc. | AndroidControl-Low<br>Step SR | AndroidControl-High<br>Type Acc. | AndroidControl-High<br>Step SR | GUI-Odyssey<br>Type Acc. | GUI-Odyssey<br>Step SR |
+|--------|-------------------------------|-----------------------------|-------------------------------|-----------------------------|------------------------|----------------------|
+| **Closed-source Models** | | | | | | |
+| GPT-4o | 74.3 | 19.4 | 66.3 | 20.8 | 34.3 | 3.3 |
+| **Open Source Models** | | | | | | |
+| Qwen2.5-VL-7B | 94.1 | 85.0 | 75.1 | 62.9 | 59.5 | 46.3 |
+| SeeClick | 93.0 | 75.0 | 82.9 | 59.1 | 71.0 | 53.9 |
+| OS-Atlas-7B | 93.6 | 85.2 | 85.2 | 71.2 | 84.5 | 62.0 |
+| Aguvis-7B| - | 80.5 | - | 61.5 | - | - |
+| Aguvis-72B| - | 84.4 | - | 66.4 | - | - |
+| OS-Genesis-7B | 90.7 | 74.2 | 66.2 | 44.5 | - | - |
+| UI-TARS-7B| 98.0 | 90.8 | 83.7 | 72.5 | 94.6 | 87.0 |
+| UI-TARS-72B| **98.1** | 91.3 | 85.2 | 74.7 | **95.4** | **88.6** |
+| GUI-R1-7B| 85.2 | 66.5 | 71.6 | 51.7 | 65.5 | 38.8 |
+| NaviMaster-7B | 85.6 | 69.9 | 72.9 | 54.0 | - | - |
+| UI-AGILE-7B | 87.7 | 77.6 | 80.1 | 60.6 | - | - |
+| AgentCPM-GUI | 94.4 | 90.2 | 77.7 | 69.2 | 90.0 | 75.0 |
+| **Ours** | | | | | | |
+| UI-Venus-Navi-7B | 97.1 | 92.4 | **86.5** | 76.1 | 87.3 | 71.5 |
+| UI-Venus-Navi-72B | 96.7 | **92.9** | 85.9 | **77.2** | 87.2 | 72.4 |
+> **Table:** Performance comparison on offline UI navigation datasets including AndroidControl and GUI-Odyssey. Note that models with * are reproduced.
 # Citation
 Please consider citing if you find our work useful:
 ```plain
       primaryClass={cs.CV},
       url={https://arxiv.org/abs/2508.10833},
 }
+```