GLM-OCR ONNX (Static Split, Edge-Oriented)

This repository contains a production-oriented ONNX export/bundle for GLM-OCR with static graph wiring and quantization-aware layout for edge/browser deployment workflows.

Credits and Upstream

Original model and research release: zai-org/GLM-OCR
- Hugging Face: https://huggingface.co/zai-org/GLM-OCR
- GitHub: https://github.com/zai-org/GLM-OCR
This repo is a deployment/export artifact built from the upstream model for ONNX static inference pipelines.

Please cite and credit the original GLM-OCR authors for model architecture, training, and benchmark claims.

What Is Included

manifest.json: runtime manifest for static Python/ONNX flows.
manifest.web.json: ORT Web (WASM/WebGPU) wiring manifest.
fp16/: core fp16 split graphs and external weight shards.
quant/: quantized vision graph (vision_quant) and external shard.

The bundle is organized so quantized assets are clearly separated from fp16 assets.

Notes on Quality and Optimization

Primary quality baseline is upstream GLM-OCR behavior, with quality-preserving deployment optimizations.
Quantization is applied selectively (not blanket full-model int8) to avoid OCR quality degradation on difficult layouts.
vision_quant is provided as an optional path, while fp16 vision remains available.

Python Inference Example

Use your static runner with manifest.json from this model repo.

python run_onnx_static.py \
  --artifact_dir . \
  --image ./examples/source/page.png \
  --task document \
  --device cuda \
  --cuda_no_fallback \
  --official_quality \
  --vision_policy table_quant \
  --out_text ./pred.md

--vision_policy table_quant keeps conservative quality defaults for document/text while using quantized vision where appropriate for tables.

Browser / ORT Web (WASM-WebGPU)

Use manifest.web.json for session graph wiring.

For constrained clients, prefer hybrid/server-assisted profiles in the manifest.
Full in-browser loading of all graphs may exceed practical memory on many devices.

Minimal JS loading sketch:

import * as ort from "onnxruntime-web";

const manifest = await fetch("manifest.web.json").then((r) => r.json());
const visionPath = manifest.graphs.vision; // or manifest.graphs.vision_quant
const session = await ort.InferenceSession.create(visionPath, {
  executionProviders: ["webgpu"], // fallback to "wasm" when needed
});

Hugging Face Model Repo Upload Tips

Track .onnx and .data files with Git LFS.
Upload all of fp16/, quant/, and both manifest files together.
If a web app will fetch directly from this model repo, configure CORS on the app side accordingly.

License

This deployment artifact follows the upstream GLM-OCR license metadata (MIT at time of packaging).
Always verify upstream license/terms at: https://huggingface.co/zai-org/GLM-OCR

Downloads last month: 14

Model tree for Ji-Ha/glm-ocr-onnx

Base model

zai-org/GLM-OCR

Quantized

(27)

this model