GLM-OCR ONNX (int8) for Browser WebGPU
Browser-ready ONNX export of zai-org/GLM-OCR (0.9B params). Runs entirely client-side via onnxruntime-web with WebGPU. No server needed.
Components
Base Models
| File | Size | Description |
|---|---|---|
vision_encoder_int8.onnx |
~394 MB | CogViT vision encoder (int8) |
language_model_int8.onnx |
~471 MB | GLM-0.5B decoder with 3D spatial RoPE (int8) |
text_embeddings.onnx |
~348 MB | Token embedding layer |
tokenizer.json |
~7 MB | Tokenizer |
KV Cache Models (fast autoregressive decoding)
| File | Size | Description |
|---|---|---|
kv/prefill_int8.onnx |
~471 MB | Full sequence prefill -> logits + KV cache |
kv/decode_int8.onnx |
~471 MB | Single token + KV cache -> logits + updated cache |
Performance
| Mode | Speed | 100 tokens |
|---|---|---|
| Without KV cache | ~0.3 tok/s | ~5 min |
| With KV cache | ~20 tok/s | ~7 sec |
3D Spatial Position IDs
The language model accepts 3D position_ids [4, batch, seq_len] for full spatial awareness:
- Channel 0: temporal (0 for images)
- Channel 1: sequential position
- Channel 2: row position
- Channel 3: column position
Export Details
- Base model: zai-org/GLM-OCR (0.9B params)
- Quantization: int8 dynamic (onnxruntime)
- Vision encoder: TorchScript exporter, opset 14
- Language model: Dynamo exporter, opset 18
- KV cache: Packed tensor
[num_layers*2, batch, kv_heads, seq, head_dim] - 3D RoPE: Preserved via explicit position_ids input
License
Apache 2.0 (same as base model)
- Downloads last month
- 47
Model tree for brad-agi/glm-ocr-onnx-webgpu
Base model
zai-org/GLM-OCR