--- license: mit base_model: zai-org/GLM-OCR pipeline_tag: image-text-to-text tags: - ocr - document-understanding - on-prem - enterprise - kitefishai language: - en - hi library_name: transformers --- # Minnow-OCR-1B **Minnow-OCR-1B** is the baseline release in the Minnow OCR series from [KiteFishAI](https://kitefishai.com) — document intelligence models packaged for air-gapped, on-prem deployment in regulated enterprise environments (BFSI, healthcare, legal). Domain-adapted variants fine-tuned on Indian enterprise documents — multilingual forms, insurance surveyor reports, clinical records, and regulatory filings — will follow in subsequent Minnow-OCR releases. ## Why this exists Regulated Indian enterprises frequently cannot pull models from external sources at deployment time: air-gapped infrastructure, data-residency mandates, and procurement controls require a vetted, internally distributed model registry. The Minnow-OCR series provides that registry layer — known-good checkpoints, reproducible serving configurations, and a clear upgrade path to domain-adapted versions. ## Model description The underlying model is a compact (~0.9B parameter) encoder-decoder vision-language model for complex document understanding, built on the GLM-V architecture with a CogViT visual encoder, a lightweight cross-modal connector, and a GLM-0.5B language decoder. It delivers state-of-the-art results for its size class on major document benchmarks, including OmniDocBench and formula/table recognition tasks. ## Usage ```python from transformers import AutoModelForImageTextToText, AutoProcessor model_id = "KiteFishAI/Minnow-OCR-1B" processor = AutoProcessor.from_pretrained(model_id) model = AutoModelForImageTextToText.from_pretrained(model_id, device_map="auto") ``` For high-throughput serving: ```bash python -m sglang.launch_server --model KiteFishAI/Minnow-OCR-1B --port 8080 ``` ## License and attribution The model weights are released under the **MIT License**, originally developed and released by Z.ai as GLM-OCR. This repository redistributes those weights with attribution, per the license terms. If you use the full OCR pipeline with PP-DocLayoutV3 layout analysis, note that component is licensed under Apache License 2.0; comply with both licenses. ## About KiteFishAI KiteFishAI builds deployable enterprise intelligence systems for regulated Indian enterprises — full on-prem stacks spanning custom models, fine-tuning on enterprise data, and air-gapped deployment. Learn more at [kitefishai.com](https://kitefishai.com) or [huggingface.co/KiteFishAI](https://huggingface.co/KiteFishAI).