Instructions to use zai-org/GLM-4.1V-9B-Thinking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use zai-org/GLM-4.1V-9B-Thinking with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="zai-org/GLM-4.1V-9B-Thinking")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("zai-org/GLM-4.1V-9B-Thinking")
model = AutoModelForImageTextToText.from_pretrained("zai-org/GLM-4.1V-9B-Thinking")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use zai-org/GLM-4.1V-9B-Thinking with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "zai-org/GLM-4.1V-9B-Thinking"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/GLM-4.1V-9B-Thinking",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/zai-org/GLM-4.1V-9B-Thinking

SGLang

How to use zai-org/GLM-4.1V-9B-Thinking with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "zai-org/GLM-4.1V-9B-Thinking" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/GLM-4.1V-9B-Thinking",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "zai-org/GLM-4.1V-9B-Thinking" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/GLM-4.1V-9B-Thinking",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use zai-org/GLM-4.1V-9B-Thinking with Docker Model Runner:
```
docker model run hf.co/zai-org/GLM-4.1V-9B-Thinking
```

test

#19

by cher0511 - opened Aug 14, 2025

base: refs/heads/main

←

from: refs/pr/19

Discussion Files changed

+42

-59

Files changed (3) hide show

README.md +10 -8
config.json +32 -36
generation_config.json +0 -15

README.md CHANGED Viewed

@@ -4,7 +4,7 @@ language:
 - en
 - zh
 base_model:
-- zai-org/GLM-4-9B-0414
 pipeline_tag: image-text-to-text
 library_name: transformers
 tags:
@@ -14,11 +14,13 @@ tags:
 # GLM-4.1V-9B-Thinking
 <div align="center">
-<img src=https://raw.githubusercontent.com/zai-org/GLM-4.1V-Thinking/99c5eb6563236f0ff43605d91d107544da9863b2/resources/logo.svg width="40%"/>
 </div>
 <p align="center">
     📖 View the GLM-4.1V-9B-Thinking <a href="https://arxiv.org/abs/2507.01006" target="_blank">paper</a>.
     <br>
     📍 Using GLM-4.1V-9B-Thinking API at <a href="https://www.bigmodel.cn/dev/api/visual-reasoning-model/GLM-4.1V-Thinking">Zhipu Foundation Model Open Platform</a>
 </p>
@@ -30,14 +32,14 @@ increasingly complex, VLMs must evolve beyond basic multimodal perception to enh
 complex tasks. This involves improving accuracy, comprehensiveness, and intelligence, enabling applications such as
 complex problem solving, long-context understanding, and multimodal agents.
-Based on the [GLM-4-9B-0414](https://github.com/zai-org/GLM-4) foundation model, we present the new open-source VLM model
 **GLM-4.1V-9B-Thinking**, designed to explore the upper limits of reasoning in vision-language models. By introducing
 a "thinking paradigm" and leveraging reinforcement learning, the model significantly enhances its capabilities. It
 achieves state-of-the-art performance among 10B-parameter VLMs, matching or even surpassing the 72B-parameter
 Qwen-2.5-VL-72B on 18 benchmark tasks. We are also open-sourcing the base model GLM-4.1V-9B-Base to
 support further research into the boundaries of VLM capabilities.
-![rl](https://raw.githubusercontent.com/zai-org/GLM-4.1V-Thinking/refs/heads/main/resources/rl.jpeg)
 Compared to the previous generation models CogVLM2 and the GLM-4V series, **GLM-4.1V-Thinking** offers the
 following improvements:
@@ -55,7 +57,7 @@ richness, and interpretability. It comprehensively surpasses traditional non-rea
 Out of 28 benchmark tasks, it achieved the best performance among 10B-level models on 23 tasks,
 and even outperformed the 72B-parameter Qwen-2.5-VL-72B on 18 tasks.
-![bench](https://raw.githubusercontent.com/zai-org/GLM-4.1V-Thinking/refs/heads/main/resources/bench.jpeg)
 ## Quick Inference
@@ -63,7 +65,7 @@ This is a simple example of running single-image inference using the `transforme
 First, install the `transformers` library from source:
 ```
-pip install transformers>=4.57.1
 ```
 Then, run the following code:
@@ -72,7 +74,7 @@ Then, run the following code:
 from transformers import AutoProcessor, Glm4vForConditionalGeneration
 import torch
-MODEL_PATH = "zai-org/GLM-4.1V-9B-Thinking"
 messages = [
     {
         "role": "user",
@@ -107,4 +109,4 @@ print(output_text)
 ```
 For video reasoning, web demo deployment, and more code, please check
-our [GitHub](https://github.com/zai-org/GLM-V).

 - en
 - zh
 base_model:
+- THUDM/GLM-4-9B-0414
 pipeline_tag: image-text-to-text
 library_name: transformers
 tags:
 # GLM-4.1V-9B-Thinking
 <div align="center">
+<img src=https://raw.githubusercontent.com/THUDM/GLM-4.1V-Thinking/99c5eb6563236f0ff43605d91d107544da9863b2/resources/logo.svg width="40%"/>
 </div>
 <p align="center">
     📖 View the GLM-4.1V-9B-Thinking <a href="https://arxiv.org/abs/2507.01006" target="_blank">paper</a>.
     <br>
+    💡 Try the <a href="https://huggingface.co/spaces/THUDM/GLM-4.1V-9B-Thinking-Demo" target="_blank">Hugging Face</a> or <a href="https://modelscope.cn/studios/ZhipuAI/GLM-4.1V-9B-Thinking-Demo" target="_blank">ModelScope</a> online demo for GLM-4.1V-9B-Thinking.
+    <br>
     📍 Using GLM-4.1V-9B-Thinking API at <a href="https://www.bigmodel.cn/dev/api/visual-reasoning-model/GLM-4.1V-Thinking">Zhipu Foundation Model Open Platform</a>
 </p>
 complex tasks. This involves improving accuracy, comprehensiveness, and intelligence, enabling applications such as
 complex problem solving, long-context understanding, and multimodal agents.
+Based on the [GLM-4-9B-0414](https://github.com/THUDM/GLM-4) foundation model, we present the new open-source VLM model
 **GLM-4.1V-9B-Thinking**, designed to explore the upper limits of reasoning in vision-language models. By introducing
 a "thinking paradigm" and leveraging reinforcement learning, the model significantly enhances its capabilities. It
 achieves state-of-the-art performance among 10B-parameter VLMs, matching or even surpassing the 72B-parameter
 Qwen-2.5-VL-72B on 18 benchmark tasks. We are also open-sourcing the base model GLM-4.1V-9B-Base to
 support further research into the boundaries of VLM capabilities.
+![rl](https://raw.githubusercontent.com/THUDM/GLM-4.1V-Thinking/refs/heads/main/resources/rl.jpeg)
 Compared to the previous generation models CogVLM2 and the GLM-4V series, **GLM-4.1V-Thinking** offers the
 following improvements:
 Out of 28 benchmark tasks, it achieved the best performance among 10B-level models on 23 tasks,
 and even outperformed the 72B-parameter Qwen-2.5-VL-72B on 18 tasks.
+![bench](https://raw.githubusercontent.com/THUDM/GLM-4.1V-Thinking/refs/heads/main/resources/bench.jpeg)
 ## Quick Inference
 First, install the `transformers` library from source:
 ```
+pip install git+https://github.com/huggingface/transformers.git
 ```
 Then, run the following code:
 from transformers import AutoProcessor, Glm4vForConditionalGeneration
 import torch
+MODEL_PATH = "THUDM/GLM-4.1V-9B-Thinking"
 messages = [
     {
         "role": "user",
 ```
 For video reasoning, web demo deployment, and more code, please check
+our [GitHub](https://github.com/THUDM/GLM-4.1V-Thinking).

config.json CHANGED Viewed

@@ -3,50 +3,38 @@
     "Glm4vForConditionalGeneration"
   ],
   "model_type": "glm4v",
   "image_start_token_id": 151339,
   "image_end_token_id": 151340,
   "video_start_token_id": 151341,
   "video_end_token_id": 151342,
   "image_token_id": 151343,
   "video_token_id": 151344,
   "tie_word_embeddings": false,
-  "transformers_version": "4.57.1",
-  "text_config": {
-    "model_type": "glm4v_text",
-    "attention_bias": true,
-    "attention_dropout": 0.0,
-    "pad_token_id": 151329,
-    "eos_token_id": [
-      151329,
-      151336,
-      151338,
-      151348
-    ],
-    "hidden_act": "silu",
-    "hidden_size": 4096,
-    "initializer_range": 0.02,
-    "intermediate_size": 13696,
-    "max_position_embeddings": 65536,
-    "num_attention_heads": 32,
-    "num_hidden_layers": 40,
-    "num_key_value_heads": 2,
-    "rms_norm_eps": 1e-05,
-    "dtype": "bfloat16",
-    "use_cache": true,
-    "vocab_size": 151552,
-    "partial_rotary_factor": 0.5,
-    "rope_theta": 10000,
-    "rope_scaling": {
-      "rope_type": "default",
-      "mrope_section": [
-        8,
-        12,
-        12
-      ]
-    }
-  },
   "vision_config": {
-    "model_type": "glm4v",
     "hidden_size": 1536,
     "depth": 24,
     "num_heads": 12,
@@ -61,5 +49,13 @@
     "rms_norm_eps": 1e-05,
     "spatial_merge_size": 2,
     "temporal_patch_size": 2
   }
 }

     "Glm4vForConditionalGeneration"
   ],
   "model_type": "glm4v",
+  "attention_bias": true,
+  "attention_dropout": 0.0,
+  "pad_token_id": 151329,
+  "eos_token_id": [
+    151329,
+    151336,
+    151338,
+    151348
+  ],
   "image_start_token_id": 151339,
   "image_end_token_id": 151340,
   "video_start_token_id": 151341,
   "video_end_token_id": 151342,
   "image_token_id": 151343,
   "video_token_id": 151344,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 13696,
+  "max_position_embeddings": 65536,
+  "num_attention_heads": 32,
+  "num_hidden_layers": 40,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-05,
+  "rope_theta": 10000.0,
   "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.53.0dev",
+  "use_cache": true,
+  "vocab_size": 151552,
+  "partial_rotary_factor": 0.5,
   "vision_config": {
     "hidden_size": 1536,
     "depth": 24,
     "num_heads": 12,
     "rms_norm_eps": 1e-05,
     "spatial_merge_size": 2,
     "temporal_patch_size": 2
+  },
+  "rope_scaling": {
+    "type": "default",
+    "mrope_section": [
+      8,
+      12,
+      12
+    ]
   }
 }

generation_config.json DELETED Viewed

@@ -1,15 +0,0 @@
-{
-  "_from_model_config": true,
-  "do_sample": true,
-  "eos_token_id": [
-    151329,
-    151336,
-    151338,
-    151348
-  ],
-  "pad_token_id": 151329,
-  "top_p": 0.6,
-  "temperature": 0.8,
-  "top_k": 2,
-  "transformers_version": "4.57.1"
-}