--- license: mit language: - zh - en - fr - es - ru - de - ja - ko pipeline_tag: image-to-text library_name: onnxruntime base_model: - zai-org/GLM-OCR --- # GLM-OCR ONNX (Static Split, Edge-Oriented) This repository contains a production-oriented ONNX export/bundle for GLM-OCR with static graph wiring and quantization-aware layout for edge/browser deployment workflows. ## Credits and Upstream - Original model and research release: `zai-org/GLM-OCR` - Hugging Face: https://huggingface.co/zai-org/GLM-OCR - GitHub: https://github.com/zai-org/GLM-OCR - This repo is a deployment/export artifact built from the upstream model for ONNX static inference pipelines. Please cite and credit the original GLM-OCR authors for model architecture, training, and benchmark claims. ## What Is Included - `manifest.json`: runtime manifest for static Python/ONNX flows. - `manifest.web.json`: ORT Web (WASM/WebGPU) wiring manifest. - `fp16/`: core fp16 split graphs and external weight shards. - `quant/`: quantized vision graph (`vision_quant`) and external shard. The bundle is organized so quantized assets are clearly separated from fp16 assets. ## Notes on Quality and Optimization - Primary quality baseline is upstream GLM-OCR behavior, with quality-preserving deployment optimizations. - Quantization is applied selectively (not blanket full-model int8) to avoid OCR quality degradation on difficult layouts. - `vision_quant` is provided as an optional path, while fp16 vision remains available. ## Python Inference Example Use your static runner with `manifest.json` from this model repo. ```bash python run_onnx_static.py \ --artifact_dir . \ --image ./examples/source/page.png \ --task document \ --device cuda \ --cuda_no_fallback \ --official_quality \ --vision_policy table_quant \ --out_text ./pred.md ``` `--vision_policy table_quant` keeps conservative quality defaults for document/text while using quantized vision where appropriate for tables. ## Browser / ORT Web (WASM-WebGPU) Use `manifest.web.json` for session graph wiring. - For constrained clients, prefer hybrid/server-assisted profiles in the manifest. - Full in-browser loading of all graphs may exceed practical memory on many devices. Minimal JS loading sketch: ```ts import * as ort from "onnxruntime-web"; const manifest = await fetch("manifest.web.json").then((r) => r.json()); const visionPath = manifest.graphs.vision; // or manifest.graphs.vision_quant const session = await ort.InferenceSession.create(visionPath, { executionProviders: ["webgpu"], // fallback to "wasm" when needed }); ``` ## Hugging Face Model Repo Upload Tips - Track `.onnx` and `.data` files with Git LFS. - Upload all of `fp16/`, `quant/`, and both manifest files together. - If a web app will fetch directly from this model repo, configure CORS on the app side accordingly. ## License This deployment artifact follows the upstream GLM-OCR license metadata (`MIT` at time of packaging). Always verify upstream license/terms at: https://huggingface.co/zai-org/GLM-OCR