GLM-OCR ONNX (Static Split, Edge-Oriented)
This repository contains a production-oriented ONNX export/bundle for GLM-OCR with static graph wiring and quantization-aware layout for edge/browser deployment workflows.
Credits and Upstream
- Original model and research release:
zai-org/GLM-OCR- Hugging Face: https://huggingface.co/zai-org/GLM-OCR
- GitHub: https://github.com/zai-org/GLM-OCR
- This repo is a deployment/export artifact built from the upstream model for ONNX static inference pipelines.
Please cite and credit the original GLM-OCR authors for model architecture, training, and benchmark claims.
What Is Included
manifest.json: runtime manifest for static Python/ONNX flows.manifest.web.json: ORT Web (WASM/WebGPU) wiring manifest.fp16/: core fp16 split graphs and external weight shards.quant/: quantized vision graph (vision_quant) and external shard.
The bundle is organized so quantized assets are clearly separated from fp16 assets.
Notes on Quality and Optimization
- Primary quality baseline is upstream GLM-OCR behavior, with quality-preserving deployment optimizations.
- Quantization is applied selectively (not blanket full-model int8) to avoid OCR quality degradation on difficult layouts.
vision_quantis provided as an optional path, while fp16 vision remains available.
Python Inference Example
Use your static runner with manifest.json from this model repo.
python run_onnx_static.py \
--artifact_dir . \
--image ./examples/source/page.png \
--task document \
--device cuda \
--cuda_no_fallback \
--official_quality \
--vision_policy table_quant \
--out_text ./pred.md
--vision_policy table_quant keeps conservative quality defaults for document/text while using quantized vision where appropriate for tables.
Browser / ORT Web (WASM-WebGPU)
Use manifest.web.json for session graph wiring.
- For constrained clients, prefer hybrid/server-assisted profiles in the manifest.
- Full in-browser loading of all graphs may exceed practical memory on many devices.
Minimal JS loading sketch:
import * as ort from "onnxruntime-web";
const manifest = await fetch("manifest.web.json").then((r) => r.json());
const visionPath = manifest.graphs.vision; // or manifest.graphs.vision_quant
const session = await ort.InferenceSession.create(visionPath, {
executionProviders: ["webgpu"], // fallback to "wasm" when needed
});
Hugging Face Model Repo Upload Tips
- Track
.onnxand.datafiles with Git LFS. - Upload all of
fp16/,quant/, and both manifest files together. - If a web app will fetch directly from this model repo, configure CORS on the app side accordingly.
License
This deployment artifact follows the upstream GLM-OCR license metadata (MIT at time of packaging).
Always verify upstream license/terms at: https://huggingface.co/zai-org/GLM-OCR
- Downloads last month
- 32
Model tree for Ji-Ha/glm-ocr-onnx
Base model
zai-org/GLM-OCR