Instructions to use zai-org/GLM-4.5-Air-FP8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use zai-org/GLM-4.5-Air-FP8 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="zai-org/GLM-4.5-Air-FP8")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("zai-org/GLM-4.5-Air-FP8")
model = AutoModelForCausalLM.from_pretrained("zai-org/GLM-4.5-Air-FP8", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use zai-org/GLM-4.5-Air-FP8 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "zai-org/GLM-4.5-Air-FP8"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/GLM-4.5-Air-FP8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/zai-org/GLM-4.5-Air-FP8

SGLang

How to use zai-org/GLM-4.5-Air-FP8 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "zai-org/GLM-4.5-Air-FP8" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/GLM-4.5-Air-FP8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "zai-org/GLM-4.5-Air-FP8" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/GLM-4.5-Air-FP8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use zai-org/GLM-4.5-Air-FP8 with Docker Model Runner:
```
docker model run hf.co/zai-org/GLM-4.5-Air-FP8
```

GLM-4.5-Air-FP8

Commit History

Enhance model card with specific tags and main GitHub link (#5)

f9a9c5a
verified

nielsr HF Staff commited on Aug 12, 2025

Improve GLM-4.5-Air-FP8 model card with detailed usage info (#4)

122df87
verified

nielsr HF Staff commited on Aug 11, 2025

tr update

1930787

zRzRzRzRzRzRzR commited on Aug 11, 2025

global maas api link

ad6acf6

zRzRzRzRzRzRzR commited on Jul 28, 2025

update

aaba837

zRzRzRzRzRzRzR commited on Jul 28, 2025

update

31c7039

zRzRzRzRzRzRzR commited on Jul 28, 2025

Add files using upload-large-folder tool

4b1dbdf
verified

zR commited on Jul 28, 2025

Add files using upload-large-folder tool

4779fa2
verified

zR commited on Jul 28, 2025

initial commit

18f14b5
verified

zR commited on Jul 20, 2025

Commit History

Enhance model card with specific tags and main GitHub link (#5) f9a9c5a verified

Improve GLM-4.5-Air-FP8 model card with detailed usage info (#4) 122df87 verified

tr update 1930787

global maas api link ad6acf6

update aaba837

update 31c7039

Add files using upload-large-folder tool 4b1dbdf verified

Add files using upload-large-folder tool 4779fa2 verified

initial commit 18f14b5 verified

Enhance model card with specific tags and main GitHub link (#5)

f9a9c5a
verified

Improve GLM-4.5-Air-FP8 model card with detailed usage info (#4)

122df87
verified

tr update

1930787

global maas api link

ad6acf6

update

aaba837

update

31c7039

Add files using upload-large-folder tool

4b1dbdf
verified

Add files using upload-large-folder tool

4779fa2
verified

initial commit

18f14b5
verified