glm-5.2-visual-runtime

This repository is the deployment report, runtime source, and one-click deployment manifest for glm-5.2-visual-runtime, a training-free multimodal runtime gateway.

It is not a fine-tuned checkpoint and it does not modify GLM-5.2 weights. The one-click deployment pulls the official model weights at runtime:

  • Reasoning: zai-org/GLM-5.2-FP8
  • Vision / omni perception: Qwen/Qwen3-Omni-30B-A3B-Instruct
  • Optional alternate reasoning: Qwen/Qwen3.6-27B
  • OCR: local OCR service/container

No hosted model provider is required in the all-local profile.

Clients call the public model ID:

glm-5.2-visual-runtime

Internally, the runtime uses:

  • GLM-5.2 API for reasoning over compact structured evidence.
  • GLM-5V-Turbo-compatible provider interface for object grounding and hard visual questions.
  • GLM-OCR-compatible provider interface for OCR, tables, and document layouts.
  • CPU image processing lenses for fingerprints, palettes, masks, and chart geometry.
  • A FastAPI OpenAI-compatible gateway with persistent visual asset variables.

Deployment Status

Item Status
Runtime source Available in the companion Space
HF Space wassemgtk/glm-5-2-visual-runtime-space
Standard vLLM base checkpoint zai-org/GLM-5.2-FP8
vLLM minimum vLLM >= 0.23.0 or the GLM recipe image
Training / fine-tuning Not used
Local GPU required for gateway No
Local GPU required for self-hosted GLM-5.2 vLLM Yes, very large multi-GPU deployment

One-Click All-Local Deployment

Use the included profile when everything must run as part of the deployment:

docker compose -f one_click/docker-compose.all-local.yml up --build

This starts:

  • gateway: the OpenAI-compatible glm-5.2-visual-runtime API.
  • glm52-vllm: local vLLM serving zai-org/GLM-5.2-FP8.
  • vision-vllm: local vLLM-Omni serving Qwen/Qwen3-Omni-30B-A3B-Instruct.
  • ocr: local OCR container endpoint.
  • minio: object storage for original images and generated artifacts.
  • postgres: persistent visual asset ledger.

The gateway exposes:

http://localhost:8000/v1

Standard vLLM Reasoning Deployment

Use the official GLM-5.2 FP8 checkpoint with vLLM:

vllm serve zai-org/GLM-5.2-FP8 \
  --served-model-name glm-5.2 \
  --kv-cache-dtype fp8 \
  --tensor-parallel-size 8 \
  --speculative-config.method mtp \
  --speculative-config.num_speculative_tokens 5 \
  --tool-call-parser glm47 \
  --reasoning-parser glm45 \
  --enable-auto-tool-choice

Docker recipe:

docker run --gpus all \
  --ipc=host \
  -p 8001:8000 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  vllm/vllm-openai:glm52 \
  zai-org/GLM-5.2-FP8 \
  --served-model-name glm-5.2 \
  --kv-cache-dtype fp8 \
  --tensor-parallel-size 8 \
  --tool-call-parser glm47 \
  --reasoning-parser glm45 \
  --enable-auto-tool-choice

Then point the visual gateway at the local vLLM OpenAI-compatible endpoint:

GLM_BASE_URL=http://glm52-vllm:8000/v1
GLM_MODEL=glm-5.2
VISION_BASE_URL=http://vision-vllm:8000/v1
VISION_MODEL=qwen3-omni
OCR_BASE_URL=http://ocr:8080
VISUAL_RUNTIME_MODE=local

Why this is a runtime model

The "model" is an API-runtime contract rather than a neural checkpoint. It makes GLM-5.2 appear vision-capable by storing images as persistent visual variables and lazily generating typed views such as:

img_01.original
img_01.fingerprint
img_01.ocr
img_01.palette
img_01.presentation_theme
img_01.chart

GLM-5.2 receives only compact, task-specific evidence.

Files in this repository

  • README.md: this deployment report.
  • deployment_report.md: operational report with hardware, mode, and acceptance status.
  • runtime_config.json: model/runtime manifest for gateway deployments.
  • one_click/docker-compose.all-local.yml: full local deployment with reasoning, vision, OCR, storage, and gateway.
  • one_click/.env.all-local.example: one-click environment file.
  • one_click/README.md: one-click deployment guide.
  • vllm/serve_glm52_fp8.sh: standard GLM-5.2 vLLM launch script.
  • vllm/serve_qwen3_omni.sh: local Qwen Omni vLLM-Omni launch script.
  • weights_manifest.json: list of all upstream checkpoint snapshots required for a fully bundled repo.
  • scripts/materialize_weights.py: downloads configured checkpoint snapshots into models/ before re-uploading this repo.
  • vllm/docker-compose.vllm.yml: vLLM OpenAI server compose file.
  • vllm/openai_smoke_test.py: verifies the vLLM reasoning endpoint.
  • gateway/.env.vllm.example: gateway environment for using local vLLM as GLM-5.2 provider.
  • apps/api: gateway source.
  • services/slides: editable PPTX rendering service.

Companion Gateway Space

Use the companion Hugging Face Docker Space for a lightweight no-GPU demo. For production one-click deployment with no hosted providers, use one_click/docker-compose.all-local.yml.

Space:

https://huggingface.co/spaces/wassemgtk/glm-5-2-visual-runtime-space

Live OpenAI-compatible base URL:

https://wassemgtk-glm-5-2-visual-runtime-space.hf.space/v1
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wassemgtk/glm-5.2-visual-runtime

Base model

zai-org/GLM-5.2
Finetuned
(9)
this model