---
license: mit
language:
- zh
- en
- fr
- es
- ru
- de
- ja
- ko
pipeline_tag: image-to-text
library_name: onnxruntime
base_model:
- zai-org/GLM-OCR
---

# GLM-OCR ONNX (Static Split, Edge-Oriented)

This repository contains a production-oriented ONNX export/bundle for GLM-OCR with static graph wiring and quantization-aware layout for edge/browser deployment workflows.

## Credits and Upstream

- Original model and research release: `zai-org/GLM-OCR`
  - Hugging Face: https://huggingface.co/zai-org/GLM-OCR
  - GitHub: https://github.com/zai-org/GLM-OCR
- This repo is a deployment/export artifact built from the upstream model for ONNX static inference pipelines.

Please cite and credit the original GLM-OCR authors for model architecture, training, and benchmark claims.

## What Is Included

- `manifest.json`: runtime manifest for static Python/ONNX flows.
- `manifest.web.json`: ORT Web (WASM/WebGPU) wiring manifest.
- `fp16/`: core fp16 split graphs and external weight shards.
- `quant/`: quantized vision graph (`vision_quant`) and external shard.

The bundle is organized so quantized assets are clearly separated from fp16 assets.

## Notes on Quality and Optimization

- Primary quality baseline is upstream GLM-OCR behavior, with quality-preserving deployment optimizations.
- Quantization is applied selectively (not blanket full-model int8) to avoid OCR quality degradation on difficult layouts.
- `vision_quant` is provided as an optional path, while fp16 vision remains available.

## Python Inference Example

Use your static runner with `manifest.json` from this model repo.

```bash
python run_onnx_static.py \
  --artifact_dir . \
  --image ./examples/source/page.png \
  --task document \
  --device cuda \
  --cuda_no_fallback \
  --official_quality \
  --vision_policy table_quant \
  --out_text ./pred.md
```

`--vision_policy table_quant` keeps conservative quality defaults for document/text while using quantized vision where appropriate for tables.

## Browser / ORT Web (WASM-WebGPU)

Use `manifest.web.json` for session graph wiring.

- For constrained clients, prefer hybrid/server-assisted profiles in the manifest.
- Full in-browser loading of all graphs may exceed practical memory on many devices.

Minimal JS loading sketch:

```ts
import * as ort from "onnxruntime-web";

const manifest = await fetch("manifest.web.json").then((r) => r.json());
const visionPath = manifest.graphs.vision; // or manifest.graphs.vision_quant
const session = await ort.InferenceSession.create(visionPath, {
  executionProviders: ["webgpu"], // fallback to "wasm" when needed
});
```

## Hugging Face Model Repo Upload Tips

- Track `.onnx` and `.data` files with Git LFS.
- Upload all of `fp16/`, `quant/`, and both manifest files together.
- If a web app will fetch directly from this model repo, configure CORS on the app side accordingly.

## License

This deployment artifact follows the upstream GLM-OCR license metadata (`MIT` at time of packaging).  
Always verify upstream license/terms at: https://huggingface.co/zai-org/GLM-OCR