glm-ocr-onnx / README.md
psyche's picture
Duplicate from Ji-Ha/glm-ocr-onnx
d8268c5
---
license: mit
language:
- zh
- en
- fr
- es
- ru
- de
- ja
- ko
pipeline_tag: image-to-text
library_name: onnxruntime
base_model:
- zai-org/GLM-OCR
---
# GLM-OCR ONNX (Static Split, Edge-Oriented)
This repository contains a production-oriented ONNX export/bundle for GLM-OCR with static graph wiring and quantization-aware layout for edge/browser deployment workflows.
## Credits and Upstream
- Original model and research release: `zai-org/GLM-OCR`
- Hugging Face: https://huggingface.co/zai-org/GLM-OCR
- GitHub: https://github.com/zai-org/GLM-OCR
- This repo is a deployment/export artifact built from the upstream model for ONNX static inference pipelines.
Please cite and credit the original GLM-OCR authors for model architecture, training, and benchmark claims.
## What Is Included
- `manifest.json`: runtime manifest for static Python/ONNX flows.
- `manifest.web.json`: ORT Web (WASM/WebGPU) wiring manifest.
- `fp16/`: core fp16 split graphs and external weight shards.
- `quant/`: quantized vision graph (`vision_quant`) and external shard.
The bundle is organized so quantized assets are clearly separated from fp16 assets.
## Notes on Quality and Optimization
- Primary quality baseline is upstream GLM-OCR behavior, with quality-preserving deployment optimizations.
- Quantization is applied selectively (not blanket full-model int8) to avoid OCR quality degradation on difficult layouts.
- `vision_quant` is provided as an optional path, while fp16 vision remains available.
## Python Inference Example
Use your static runner with `manifest.json` from this model repo.
```bash
python run_onnx_static.py \
--artifact_dir . \
--image ./examples/source/page.png \
--task document \
--device cuda \
--cuda_no_fallback \
--official_quality \
--vision_policy table_quant \
--out_text ./pred.md
```
`--vision_policy table_quant` keeps conservative quality defaults for document/text while using quantized vision where appropriate for tables.
## Browser / ORT Web (WASM-WebGPU)
Use `manifest.web.json` for session graph wiring.
- For constrained clients, prefer hybrid/server-assisted profiles in the manifest.
- Full in-browser loading of all graphs may exceed practical memory on many devices.
Minimal JS loading sketch:
```ts
import * as ort from "onnxruntime-web";
const manifest = await fetch("manifest.web.json").then((r) => r.json());
const visionPath = manifest.graphs.vision; // or manifest.graphs.vision_quant
const session = await ort.InferenceSession.create(visionPath, {
executionProviders: ["webgpu"], // fallback to "wasm" when needed
});
```
## Hugging Face Model Repo Upload Tips
- Track `.onnx` and `.data` files with Git LFS.
- Upload all of `fp16/`, `quant/`, and both manifest files together.
- If a web app will fetch directly from this model repo, configure CORS on the app side accordingly.
## License
This deployment artifact follows the upstream GLM-OCR license metadata (`MIT` at time of packaging).
Always verify upstream license/terms at: https://huggingface.co/zai-org/GLM-OCR