Spaces:

Sam20202
/

GLMOCR_Text_extraction

Sleeping

App Files Files Community

GLMOCR_Text_extraction / README.md

Sam20202

Fix README: GLM-OCR + restore HF frontmatter

363e6f1 about 1 month ago

preview code

raw

history blame contribute delete

3.23 kB

	---
	title: GLM OCR
	emoji: 📄
	colorFrom: indigo
	colorTo: purple
	sdk: docker
	app_port: 7860
	pinned: false
	---

	# GLM-OCR — Self-Hosted OCR Engine

	> A full-stack portfolio project: self-hosted OCR backend powered by zai-org/GLM-OCR, a 0.9B-param vision-language model ranked #1 on OmniDocBench V1.5.

	### 🔗 [Live Demo → https://huggingface.co/spaces/Sam20202/GLMOCR_Text_extraction](https://huggingface.co/spaces/Sam20202/GLMOCR_Text_extraction)

	---

	## What is GLM-OCR?

	GLM-OCR is a state-of-the-art open-source OCR model from
	zai-org (arXiv:2603.10910).

	Unlike traditional OCR (Tesseract, etc.) it uses a vision encoder + language model architecture:

	```
	Image → [CogViT Vision Encoder] → [GLM-0.5B LM Backbone] → Text
	```

	It handles:
	- Plain text from documents, screenshots, photos
	- Tables (preserved structure)
	- Mathematical equations (LaTeX output)
	- Code blocks (syntax preserved)
	- Multilingual text
	- Handwriting

	---

	## Project Structure

	```
	GLMOCR_Text_extraction/
	├── main.py # FastAPI server — routes, CORS, request handling
	├── ocr_engine.py # Model loading, inference, OcrResult dataclass
	├── requirements.txt # Python dependencies
	├── frontend/
	│ └── index.html # Single-file frontend (served by FastAPI)
	├── Extension/ # Browser extension (Chrome/Edge)
	├── Dockerfile # HF Spaces deployment
	├── docker-compose.yml # Local Docker deployment
	└── README.md
	```

	## Architecture Diagram

	```
	Browser
	│
	│ POST /ocr (multipart image + mode)
	▼
	┌─────────────────────────────────────┐
	│ FastAPI (main.py) │
	│ ─ CORS middleware │
	│ ─ file validation (type, size) │
	│ ─ session metrics │
	└──────────────┬──────────────────────┘
	│ image_bytes, mode
	▼
	┌─────────────────────────────────────┐
	│ GlmOcrEngine (ocr_engine.py) │
	│ ─ PIL validation & RGB conversion │
	│ ─ writes temp PNG to disk │
	│ ─ processor.apply_chat_template() │
	│ ─ returns OcrResult dataclass │
	└──────────────┬──────────────────────┘
	│ (torch.inference_mode)
	▼
	┌─────────────────────────────────────┐
	│ GLM-OCR 0.9B Model │
	│ zai-org/GLM-OCR │
	│ ─ Vision encoder (CogViT) │
	│ ─ LM backbone (GLM-0.5B) │
	│ ─ Runs on CUDA / CPU / MPS │
	└─────────────────────────────────────┘
	```


	## References

	- Paper: [GLM-OCR](https://arxiv.org/abs/2603.10910)
	- Model: [zai-org/GLM-OCR](https://huggingface.co/zai-org/GLM-OCR)