glm-5.2-visual-runtime
This repository is the deployment report, runtime source, and one-click deployment manifest for glm-5.2-visual-runtime, a training-free multimodal runtime gateway.
It is not a fine-tuned checkpoint and it does not modify GLM-5.2 weights. The one-click deployment pulls the official model weights at runtime:
- Reasoning:
zai-org/GLM-5.2-FP8 - Vision / omni perception:
Qwen/Qwen3-Omni-30B-A3B-Instruct - Optional alternate reasoning:
Qwen/Qwen3.6-27B - OCR: local OCR service/container
No hosted model provider is required in the all-local profile.
Clients call the public model ID:
glm-5.2-visual-runtime
Internally, the runtime uses:
- GLM-5.2 API for reasoning over compact structured evidence.
- GLM-5V-Turbo-compatible provider interface for object grounding and hard visual questions.
- GLM-OCR-compatible provider interface for OCR, tables, and document layouts.
- CPU image processing lenses for fingerprints, palettes, masks, and chart geometry.
- A FastAPI OpenAI-compatible gateway with persistent visual asset variables.
Deployment Status
| Item | Status |
|---|---|
| Runtime source | Available in the companion Space |
| HF Space | wassemgtk/glm-5-2-visual-runtime-space |
| Standard vLLM base checkpoint | zai-org/GLM-5.2-FP8 |
| vLLM minimum | vLLM >= 0.23.0 or the GLM recipe image |
| Training / fine-tuning | Not used |
| Local GPU required for gateway | No |
| Local GPU required for self-hosted GLM-5.2 vLLM | Yes, very large multi-GPU deployment |
One-Click All-Local Deployment
Use the included profile when everything must run as part of the deployment:
docker compose -f one_click/docker-compose.all-local.yml up --build
This starts:
gateway: the OpenAI-compatibleglm-5.2-visual-runtimeAPI.glm52-vllm: local vLLM servingzai-org/GLM-5.2-FP8.vision-vllm: local vLLM-Omni servingQwen/Qwen3-Omni-30B-A3B-Instruct.ocr: local OCR container endpoint.minio: object storage for original images and generated artifacts.postgres: persistent visual asset ledger.
The gateway exposes:
http://localhost:8000/v1
Standard vLLM Reasoning Deployment
Use the official GLM-5.2 FP8 checkpoint with vLLM:
vllm serve zai-org/GLM-5.2-FP8 \
--served-model-name glm-5.2 \
--kv-cache-dtype fp8 \
--tensor-parallel-size 8 \
--speculative-config.method mtp \
--speculative-config.num_speculative_tokens 5 \
--tool-call-parser glm47 \
--reasoning-parser glm45 \
--enable-auto-tool-choice
Docker recipe:
docker run --gpus all \
--ipc=host \
-p 8001:8000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
vllm/vllm-openai:glm52 \
zai-org/GLM-5.2-FP8 \
--served-model-name glm-5.2 \
--kv-cache-dtype fp8 \
--tensor-parallel-size 8 \
--tool-call-parser glm47 \
--reasoning-parser glm45 \
--enable-auto-tool-choice
Then point the visual gateway at the local vLLM OpenAI-compatible endpoint:
GLM_BASE_URL=http://glm52-vllm:8000/v1
GLM_MODEL=glm-5.2
VISION_BASE_URL=http://vision-vllm:8000/v1
VISION_MODEL=qwen3-omni
OCR_BASE_URL=http://ocr:8080
VISUAL_RUNTIME_MODE=local
Why this is a runtime model
The "model" is an API-runtime contract rather than a neural checkpoint. It makes GLM-5.2 appear vision-capable by storing images as persistent visual variables and lazily generating typed views such as:
img_01.original
img_01.fingerprint
img_01.ocr
img_01.palette
img_01.presentation_theme
img_01.chart
GLM-5.2 receives only compact, task-specific evidence.
Files in this repository
README.md: this deployment report.deployment_report.md: operational report with hardware, mode, and acceptance status.runtime_config.json: model/runtime manifest for gateway deployments.one_click/docker-compose.all-local.yml: full local deployment with reasoning, vision, OCR, storage, and gateway.one_click/.env.all-local.example: one-click environment file.one_click/README.md: one-click deployment guide.vllm/serve_glm52_fp8.sh: standard GLM-5.2 vLLM launch script.vllm/serve_qwen3_omni.sh: local Qwen Omni vLLM-Omni launch script.weights_manifest.json: list of all upstream checkpoint snapshots required for a fully bundled repo.scripts/materialize_weights.py: downloads configured checkpoint snapshots intomodels/before re-uploading this repo.vllm/docker-compose.vllm.yml: vLLM OpenAI server compose file.vllm/openai_smoke_test.py: verifies the vLLM reasoning endpoint.gateway/.env.vllm.example: gateway environment for using local vLLM as GLM-5.2 provider.apps/api: gateway source.services/slides: editable PPTX rendering service.
Companion Gateway Space
Use the companion Hugging Face Docker Space for a lightweight no-GPU demo. For production one-click deployment with no hosted providers, use one_click/docker-compose.all-local.yml.
Space:
https://huggingface.co/spaces/wassemgtk/glm-5-2-visual-runtime-space
Live OpenAI-compatible base URL:
https://wassemgtk-glm-5-2-visual-runtime-space.hf.space/v1
Model tree for wassemgtk/glm-5.2-visual-runtime
Base model
zai-org/GLM-5.2