Instructions to use Kwaipilot/KAT-Dev-72B-Exp with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Kwaipilot/KAT-Dev-72B-Exp with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Kwaipilot/KAT-Dev-72B-Exp")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Kwaipilot/KAT-Dev-72B-Exp")
model = AutoModelForCausalLM.from_pretrained("Kwaipilot/KAT-Dev-72B-Exp")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Local Apps Settings

vLLM

How to use Kwaipilot/KAT-Dev-72B-Exp with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Kwaipilot/KAT-Dev-72B-Exp"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kwaipilot/KAT-Dev-72B-Exp",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Kwaipilot/KAT-Dev-72B-Exp

SGLang

How to use Kwaipilot/KAT-Dev-72B-Exp with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Kwaipilot/KAT-Dev-72B-Exp" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kwaipilot/KAT-Dev-72B-Exp",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Kwaipilot/KAT-Dev-72B-Exp" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kwaipilot/KAT-Dev-72B-Exp",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Kwaipilot/KAT-Dev-72B-Exp with Docker Model Runner:
```
docker model run hf.co/Kwaipilot/KAT-Dev-72B-Exp
```

The measured effect exceeded my expectations, does it support vllm or sglang deployment?

by wawoshashi - opened Oct 13, 2025

Discussion

wawoshashi

Oct 13, 2025

•

edited Oct 13, 2025

KAT-Code is being used. Anyway, the API is free to use. Let's get started! Configured it in ClaudeCode, and performed a task: attempted to extract project.json (Scratch's code) from an HTML exported by Scratch. After some research by KAT, it was determined to be a project packaged by TurboWarp, then wrote a Python program to handle it, and surprisingly, it was completed in the end. Very pleasantly surprised.

Tried it again with the real Claude 4.5, the real big brother first attempted to write Python to extract it but failed, saying that compilation into JavaScript made direct extraction very difficult...
Attempted to create an extractor.html for me, sounds good.
In the middle, encountered other problems.

Although a single experiment cannot demonstrate overall capability, the open-source model actually completed my impromptu task, which was enough to catch my attention. Deploy it!

wawoshashi

Oct 13, 2025

Is there an official communication channel? Looking for a WeChat group

shunxing1234

Kwaipilot org Oct 13, 2025

Thanks for your interest! Here’s the QR code for our WeChat group — feel free to join and chat with us!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment