| --- |
| license: mit |
| language: |
| - zh |
| - en |
| - fr |
| - es |
| - ru |
| - de |
| - ja |
| - ko |
| pipeline_tag: image-to-text |
| library_name: onnxruntime |
| base_model: |
| - zai-org/GLM-OCR |
| --- |
| |
| # GLM-OCR ONNX (Static Split, Edge-Oriented) |
|
|
| This repository contains a production-oriented ONNX export/bundle for GLM-OCR with static graph wiring and quantization-aware layout for edge/browser deployment workflows. |
|
|
| ## Credits and Upstream |
|
|
| - Original model and research release: `zai-org/GLM-OCR` |
| - Hugging Face: https://huggingface.co/zai-org/GLM-OCR |
| - GitHub: https://github.com/zai-org/GLM-OCR |
| - This repo is a deployment/export artifact built from the upstream model for ONNX static inference pipelines. |
|
|
| Please cite and credit the original GLM-OCR authors for model architecture, training, and benchmark claims. |
|
|
| ## What Is Included |
|
|
| - `manifest.json`: runtime manifest for static Python/ONNX flows. |
| - `manifest.web.json`: ORT Web (WASM/WebGPU) wiring manifest. |
| - `fp16/`: core fp16 split graphs and external weight shards. |
| - `quant/`: quantized vision graph (`vision_quant`) and external shard. |
|
|
| The bundle is organized so quantized assets are clearly separated from fp16 assets. |
|
|
| ## Notes on Quality and Optimization |
|
|
| - Primary quality baseline is upstream GLM-OCR behavior, with quality-preserving deployment optimizations. |
| - Quantization is applied selectively (not blanket full-model int8) to avoid OCR quality degradation on difficult layouts. |
| - `vision_quant` is provided as an optional path, while fp16 vision remains available. |
|
|
| ## Python Inference Example |
|
|
| Use your static runner with `manifest.json` from this model repo. |
|
|
| ```bash |
| python run_onnx_static.py \ |
| --artifact_dir . \ |
| --image ./examples/source/page.png \ |
| --task document \ |
| --device cuda \ |
| --cuda_no_fallback \ |
| --official_quality \ |
| --vision_policy table_quant \ |
| --out_text ./pred.md |
| ``` |
|
|
| `--vision_policy table_quant` keeps conservative quality defaults for document/text while using quantized vision where appropriate for tables. |
|
|
| ## Browser / ORT Web (WASM-WebGPU) |
|
|
| Use `manifest.web.json` for session graph wiring. |
|
|
| - For constrained clients, prefer hybrid/server-assisted profiles in the manifest. |
| - Full in-browser loading of all graphs may exceed practical memory on many devices. |
|
|
| Minimal JS loading sketch: |
|
|
| ```ts |
| import * as ort from "onnxruntime-web"; |
| |
| const manifest = await fetch("manifest.web.json").then((r) => r.json()); |
| const visionPath = manifest.graphs.vision; // or manifest.graphs.vision_quant |
| const session = await ort.InferenceSession.create(visionPath, { |
| executionProviders: ["webgpu"], // fallback to "wasm" when needed |
| }); |
| ``` |
|
|
| ## Hugging Face Model Repo Upload Tips |
|
|
| - Track `.onnx` and `.data` files with Git LFS. |
| - Upload all of `fp16/`, `quant/`, and both manifest files together. |
| - If a web app will fetch directly from this model repo, configure CORS on the app side accordingly. |
|
|
| ## License |
|
|
| This deployment artifact follows the upstream GLM-OCR license metadata (`MIT` at time of packaging). |
| Always verify upstream license/terms at: https://huggingface.co/zai-org/GLM-OCR |