Spaces:
Sleeping
Sleeping
metadata
title: GLM OCR
emoji: π
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
GLM-OCR β Self-Hosted OCR Engine
A full-stack portfolio project: self-hosted OCR backend powered by zai-org/GLM-OCR, a 0.9B-param vision-language model ranked #1 on OmniDocBench V1.5.
π Live Demo β https://huggingface.co/spaces/Sam20202/GLMOCR_Text_extraction
What is GLM-OCR?
GLM-OCR is a state-of-the-art open-source OCR model from
zai-org (arXiv:2603.10910).
Unlike traditional OCR (Tesseract, etc.) it uses a vision encoder + language model architecture:
Image β [CogViT Vision Encoder] β [GLM-0.5B LM Backbone] β Text
It handles:
- Plain text from documents, screenshots, photos
- Tables (preserved structure)
- Mathematical equations (LaTeX output)
- Code blocks (syntax preserved)
- Multilingual text
- Handwriting
Project Structure
GLMOCR_Text_extraction/
βββ main.py # FastAPI server β routes, CORS, request handling
βββ ocr_engine.py # Model loading, inference, OcrResult dataclass
βββ requirements.txt # Python dependencies
βββ frontend/
β βββ index.html # Single-file frontend (served by FastAPI)
βββ Extension/ # Browser extension (Chrome/Edge)
βββ Dockerfile # HF Spaces deployment
βββ docker-compose.yml # Local Docker deployment
βββ README.md
Architecture Diagram
Browser
β
β POST /ocr (multipart image + mode)
βΌ
βββββββββββββββββββββββββββββββββββββββ
β FastAPI (main.py) β
β β CORS middleware β
β β file validation (type, size) β
β β session metrics β
ββββββββββββββββ¬βββββββββββββββββββββββ
β image_bytes, mode
βΌ
βββββββββββββββββββββββββββββββββββββββ
β GlmOcrEngine (ocr_engine.py) β
β β PIL validation & RGB conversion β
β β writes temp PNG to disk β
β β processor.apply_chat_template() β
β β returns OcrResult dataclass β
ββββββββββββββββ¬βββββββββββββββββββββββ
β (torch.inference_mode)
βΌ
βββββββββββββββββββββββββββββββββββββββ
β GLM-OCR 0.9B Model β
β zai-org/GLM-OCR β
β β Vision encoder (CogViT) β
β β LM backbone (GLM-0.5B) β
β β Runs on CUDA / CPU / MPS β
βββββββββββββββββββββββββββββββββββββββ
References
- Paper: GLM-OCR
- Model: zai-org/GLM-OCR