glm-5.2-visual-runtime

This repository is the deployment report, runtime source, and one-click deployment manifest for glm-5.2-visual-runtime, a training-free multimodal runtime gateway.

It is not a fine-tuned checkpoint and it does not modify GLM-5.2 weights. The one-click deployment pulls the official model weights at runtime:

Reasoning: zai-org/GLM-5.2-FP8
Vision / omni perception: Qwen/Qwen3-Omni-30B-A3B-Instruct
Optional alternate reasoning: Qwen/Qwen3.6-27B
OCR: local OCR service/container

No hosted model provider is required in the all-local profile.

Clients call the public model ID:

glm-5.2-visual-runtime

Internally, the runtime uses:

GLM-5.2 API for reasoning over compact structured evidence.
GLM-5V-Turbo-compatible provider interface for object grounding and hard visual questions.
GLM-OCR-compatible provider interface for OCR, tables, and document layouts.
CPU image processing lenses for fingerprints, palettes, masks, and chart geometry.
A FastAPI OpenAI-compatible gateway with persistent visual asset variables.

Deployment Status

Item	Status
Runtime source	Available in the companion Space
HF Space	`wassemgtk/glm-5-2-visual-runtime-space`
Standard vLLM base checkpoint	`zai-org/GLM-5.2-FP8`
vLLM minimum	`vLLM >= 0.23.0` or the GLM recipe image
Training / fine-tuning	Not used
Local GPU required for gateway	No
Local GPU required for self-hosted GLM-5.2 vLLM	Yes, very large multi-GPU deployment

One-Click All-Local Deployment

Use the included profile when everything must run as part of the deployment:

docker compose -f one_click/docker-compose.all-local.yml up --build

This starts:

gateway: the OpenAI-compatible glm-5.2-visual-runtime API.
glm52-vllm: local vLLM serving zai-org/GLM-5.2-FP8.
vision-vllm: local vLLM-Omni serving Qwen/Qwen3-Omni-30B-A3B-Instruct.
ocr: local OCR container endpoint.
minio: object storage for original images and generated artifacts.
postgres: persistent visual asset ledger.

The gateway exposes:

http://localhost:8000/v1

Standard vLLM Reasoning Deployment

Use the official GLM-5.2 FP8 checkpoint with vLLM:

vllm serve zai-org/GLM-5.2-FP8 \
  --served-model-name glm-5.2 \
  --kv-cache-dtype fp8 \
  --tensor-parallel-size 8 \
  --speculative-config.method mtp \
  --speculative-config.num_speculative_tokens 5 \
  --tool-call-parser glm47 \
  --reasoning-parser glm45 \
  --enable-auto-tool-choice

Docker recipe:

docker run --gpus all \
  --ipc=host \
  -p 8001:8000 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  vllm/vllm-openai:glm52 \
  zai-org/GLM-5.2-FP8 \
  --served-model-name glm-5.2 \
  --kv-cache-dtype fp8 \
  --tensor-parallel-size 8 \
  --tool-call-parser glm47 \
  --reasoning-parser glm45 \
  --enable-auto-tool-choice

Then point the visual gateway at the local vLLM OpenAI-compatible endpoint:

GLM_BASE_URL=http://glm52-vllm:8000/v1
GLM_MODEL=glm-5.2
VISION_BASE_URL=http://vision-vllm:8000/v1
VISION_MODEL=qwen3-omni
OCR_BASE_URL=http://ocr:8080
VISUAL_RUNTIME_MODE=local

Why this is a runtime model

The "model" is an API-runtime contract rather than a neural checkpoint. It makes GLM-5.2 appear vision-capable by storing images as persistent visual variables and lazily generating typed views such as:

img_01.original
img_01.fingerprint
img_01.ocr
img_01.palette
img_01.presentation_theme
img_01.chart

GLM-5.2 receives only compact, task-specific evidence.

Files in this repository

README.md: this deployment report.
deployment_report.md: operational report with hardware, mode, and acceptance status.
runtime_config.json: model/runtime manifest for gateway deployments.
one_click/docker-compose.all-local.yml: full local deployment with reasoning, vision, OCR, storage, and gateway.
one_click/.env.all-local.example: one-click environment file.
one_click/README.md: one-click deployment guide.
vllm/serve_glm52_fp8.sh: standard GLM-5.2 vLLM launch script.
vllm/serve_qwen3_omni.sh: local Qwen Omni vLLM-Omni launch script.
weights_manifest.json: list of all upstream checkpoint snapshots required for a fully bundled repo.
scripts/materialize_weights.py: downloads configured checkpoint snapshots into models/ before re-uploading this repo.
vllm/docker-compose.vllm.yml: vLLM OpenAI server compose file.
vllm/openai_smoke_test.py: verifies the vLLM reasoning endpoint.
gateway/.env.vllm.example: gateway environment for using local vLLM as GLM-5.2 provider.
apps/api: gateway source.
services/slides: editable PPTX rendering service.

Companion Gateway Space

Use the companion Hugging Face Docker Space for a lightweight no-GPU demo. For production one-click deployment with no hosted providers, use one_click/docker-compose.all-local.yml.

Space:

https://huggingface.co/spaces/wassemgtk/glm-5-2-visual-runtime-space

Live OpenAI-compatible base URL:

https://wassemgtk-glm-5-2-visual-runtime-space.hf.space/v1

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wassemgtk/glm-5.2-visual-runtime

Base model

zai-org/GLM-5.2

Finetuned

(9)

this model