Duplicated from Ji-Ha/glm-ocr-onnx

psyche
/

glm-ocr-onnx

Model card Files Files and versions

glm-ocr-onnx / README.md

psyche's picture

Duplicate from Ji-Ha/glm-ocr-onnx

d8268c5 6 days ago

|

history blame contribute delete

3.06 kB

	---
	license: mit
	language:
	- zh
	- en
	- fr
	- es
	- ru
	- de
	- ja
	- ko
	pipeline_tag: image-to-text
	library_name: onnxruntime
	base_model:
	- zai-org/GLM-OCR
	---

	# GLM-OCR ONNX (Static Split, Edge-Oriented)

	This repository contains a production-oriented ONNX export/bundle for GLM-OCR with static graph wiring and quantization-aware layout for edge/browser deployment workflows.

	## Credits and Upstream

	- Original model and research release: `zai-org/GLM-OCR`
	- Hugging Face: https://huggingface.co/zai-org/GLM-OCR
	- GitHub: https://github.com/zai-org/GLM-OCR
	- This repo is a deployment/export artifact built from the upstream model for ONNX static inference pipelines.

	Please cite and credit the original GLM-OCR authors for model architecture, training, and benchmark claims.

	## What Is Included

	- `manifest.json`: runtime manifest for static Python/ONNX flows.
	- `manifest.web.json`: ORT Web (WASM/WebGPU) wiring manifest.
	- `fp16/`: core fp16 split graphs and external weight shards.
	- `quant/`: quantized vision graph (`vision_quant`) and external shard.

	The bundle is organized so quantized assets are clearly separated from fp16 assets.

	## Notes on Quality and Optimization

	- Primary quality baseline is upstream GLM-OCR behavior, with quality-preserving deployment optimizations.
	- Quantization is applied selectively (not blanket full-model int8) to avoid OCR quality degradation on difficult layouts.
	- `vision_quant` is provided as an optional path, while fp16 vision remains available.

	## Python Inference Example

	Use your static runner with `manifest.json` from this model repo.

	```bash
	python run_onnx_static.py \
	--artifact_dir . \
	--image ./examples/source/page.png \
	--task document \
	--device cuda \
	--cuda_no_fallback \
	--official_quality \
	--vision_policy table_quant \
	--out_text ./pred.md
	```

	`--vision_policy table_quant` keeps conservative quality defaults for document/text while using quantized vision where appropriate for tables.

	## Browser / ORT Web (WASM-WebGPU)

	Use `manifest.web.json` for session graph wiring.

	- For constrained clients, prefer hybrid/server-assisted profiles in the manifest.
	- Full in-browser loading of all graphs may exceed practical memory on many devices.

	Minimal JS loading sketch:

	```ts
	import * as ort from "onnxruntime-web";

	const manifest = await fetch("manifest.web.json").then((r) => r.json());
	const visionPath = manifest.graphs.vision; // or manifest.graphs.vision_quant
	const session = await ort.InferenceSession.create(visionPath, {
	executionProviders: ["webgpu"], // fallback to "wasm" when needed
	});
	```

	## Hugging Face Model Repo Upload Tips

	- Track `.onnx` and `.data` files with Git LFS.
	- Upload all of `fp16/`, `quant/`, and both manifest files together.
	- If a web app will fetch directly from this model repo, configure CORS on the app side accordingly.

	## License

	This deployment artifact follows the upstream GLM-OCR license metadata (`MIT` at time of packaging).
	Always verify upstream license/terms at: https://huggingface.co/zai-org/GLM-OCR